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Part I 
Introduction 


1.1 What Is Mathematics About? 


It is notoriously hard to give a satisfactory answer to 
the question, “What is mathematics?” The approach of 
this book is not to try. Rather than giving a definition of 
mathematics, the intention is to give a good idea of what 
mathematics is by describing many of its most impor- 
tant concepts, theorems, and apphcations. Nevertheless, 
to make sense of all this information it is useful to be 
able to classify it somehow. 

The most obvious way of classifying mathematics is by 
its subject matter, and that will be the approach of this 
brief introductory section and the longer section enti- 
tled SOME FUNDAMENTAL MATHEMATICAL DEFINITIONS 

[1.3]. However, it is not the only way, and not even obvi- 
ously the hest way. Another approach is to try to clas- 
sify the kinds of questions that mathematicians like to 
think about. This gives a usefully different view of the 
subject: it often happens that two areas of mathematics 
that appear very different if you pay attention to their 
subject matter are much more similar if you look at the 
kinds of questions that are being asked. The last sec- 
tion of part I, entitled the general goals of mathe- 
matical research [1.4], looks at the subject from this 
point of view. At the end of that article there is a brief 
discussion of what one might regard as a third classi- 
fication, not so much of mathematics itself but of the 
content of a typical article in a mathematics journal. As 
well as theorems and proofs, such an article will contain 
definitions, examples, lemmas, formulas, conjectures, 
and so on. The point of that discussion will be to say 
what these words mean and why the different kinds of 
mathematical output are important. 

1 Algebra, Geometry, and Analysis 

Although any classification of the subject matter of 
mathematics must immediately be hedged around with 
qualifications, there is a crude division that undoubtedly 
works well as a first approximation, namely the division 


of mathematics into algebra, geometry, and analysis. So 
let us begin with this, and then qualify it later. 

1.1 Algebra versus Geometry 

Most people who have done some high-school mathe- 
matics will think of algebra as the sort of mathemat- 
ics that results when you substitute letters for num- 
bers. Algebra will often be contrasted with arithmetic, 
which is a more direct study of the numbers themselves. 
So, for example, the question, “What is 3 x 7?” will be 
thought of as belonging to arithmetic, while the ques- 
tion, “If x + y = 10 and xy = 21, then what is the 
value of the larger of x and y?" will be regarded as a 
piece of algebra. This contrast is less apparent in more 
advanced mathematics for the simple reason that it is 
very rare for numbers to appear without letters to keep 
them company. 

There is, however, a different contrast, between alge- 
bra and geometry, which is much more important at an 
advanced level. The high-school conception of geometry 
is that it is the study of shapes such as circles, trian- 
gles, cubes, and spheres together with concepts such 
as rotations, reflections, symmetries, and so on. Thus, 
the objects of geometry, and the processes that they 
undergo, have a much more visual character than the 
equations of algebra. 

This contrast persists right up to the frontiers of mod- 
ern mathematical research. Some parts of mathemat- 
ics involve manipulating symbols according to certain 
rules: for example, a true equation remains true if you 
“do the same to both sides.” These parts would typically 
be thought of as algebraic, whereas other parts are con- 
cerned with concepts that can be visualized, and these 
are typically thought of as geometrical. 

However, a distinction like this is never simple. If you 
look at a typical research paper in geometry, will it be full 
of pictures? Almost certainly not. In faet, the methods 
used to solve geometrical problems very often involve 
a great deal of symbolic manipulation, although good 
powers of visualization may be needed to find and use 
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these methods and pictures will typically underlie what 
is going on. As for algebra, is it “mere” symbolic manip- 
ulation? Not at all: very often one solves an algebraic 
problem by Ånding a way to visualize it. 

As an example of visualizing an algebraic problem, 
consider how one might justify the rule that if a and 
b are positive integers then ab = ba. It is possible to 
approach the problem as a pure piece of algebra (per- 
haps proving it by induction), but the easiest way to con- 
vince yourself that it is true is to imagine a rectangular 
array that consists of a rows with b objects in each row. 
The total number of objects can be thought of as a lots 
of b, if you count it row by row, or as b lots of a, if you 
count it column by column. Therefore, ab = ba. Similar 
justiAcations can be given for other basic rules such as 
a(b + c) = ab + ac and a(bc) = ( ab)c . 

In the other direction, it turns out that a good way 
of solving many geometrical problems is to “convert 
them into algebra.” The most famous way of doing this 
is to use Cartesian coordinates. For example, suppose 
that you want to know what happens if you reflect a 
circle about a line L through its center, then rotate it 
through 40° counterclockwise, and then reAect it once 
more about the same line L. One approach is to visualize 
the situation as follows. 

Imagine that the circle is made of a thin piece of wood. 
Then instead of reAecting it about the line you can rotate 
it through 180° about L (using the third dimension). The 
result will be upside down, but this does not matter if 
you simply ignore the thickness of the wood. Now if you 
look up at the circle from below while it is rotated coun- 
terclockwise through 40°, what you will see is a circle 
being rotated clockwise through 40°. Therefore, if you 
then turn it back the right way up, by rotating about L 
once again, the total effect wAl have been a clockwise 
rotation through 40°. 

Mathematicians vary widely in their ability and willing- 
ness to follow an argument like that one. If you cannot 
quite visualize it well enough to see that it is deAnitely 
correct, then you may prefer an algebraic approach, 
using the theory of linear algebra and matrices (which 
will be discussed in more detail in [1.3 §4.2]). To begin 
with, one dunks of the circle as the set of all pairs of 
numbers (x,y) such that x 2 + y 2 ^ 1. The two trans- 
formations, reAection in a line through the center of the 
circle and rotation through an angle 0, can both be rep- 
resented by 2 x 2 matrices, which are arrays of numbers 
of the form ( “ b d ) . There is a slightly complicated, but 
purely algebraic, rule for multiplying matrices together, 
and it is designed to have the property that A matrix A 
represents a transformation R (such as a reAection) and 


matrix B represents a transformation T, then the prod- 
uct AB represents the transformation that results when 
you Arst do T and then R. Therefore, one can solve 
the problem above by writing down the matrices that 
correspond to the transformations, multiplying them 
together, and seeing what transformation corresponds 
to the product. In this way, the geometrical problem has 
been converted into algebra and solved algebraically. 

Thus, while one can draw a useful distinction between 
algebra and geometry, one should not imagine that the 
boundary between the two is sharply deAned. In faet, 
one of the major branches of mathematics is even called 
algebraic geometry [IV.7]. And as the above examples 
Ulustrate, it is often possible to translate a piece of math- 
ematics from algebra into geometry or vice versa. Never- 
theless, there is a deAnite difference between algebraic 
and geometric methods of thtnking— one more symbolic 
and one more pictorial— and this can have a profound 
inAuence on the subjects that mathematicians choose 
to pursue. 

1.2 Algebra versus Analysis 

The word “analysis,” used to denote a branch of math- 
ematics, is not one that features at high-school level. 
However, the word “calculus” is mueh more fanuliar, 
and differentiation and integration are good examples of 
mathematics that would be classiAed as analysis rather 
than algebra or geometry. The reason for this is that they 
involve limiting processes. For example, the derivative of 
a funetion / at a point x is the limit of the gradients 
of a sequence of chords of the graph of /, and the area 
of a shape with a curved boundary is deAned to be the 
limit of the areas of rectUinear regions that All up more 
and more of the shape. (These concepts are discussed in 
mueh more detail in [1.3 §5].) 

Thus, as a first approximation, one might say that a 
branch of mathematics belongs to analysis if it involves 
limiting processes, whereas it belongs to algebra if you 
can get to the answer af ter just a Anite sequence of steps. 
However, here again the Arst approximation is so erude 
as to be misleading, and for a sinnlar reason: if one looks 
more closely one Ands that it is not so mueh branches 
of mathematics that should be classiAed into analysis or 
algebra, but mathematical techniques. 

Given that we cannot write out infinitely long proofs, 
how can we hope to prove anything about limiting pro- 
cesses? To answer this, let us look at the justiAcation for 
the simple statement that the derivative of x 3 is 3x 2 . The 
usual reasoning is that the gradient of the chord of the 
line joining the two points (x,x 3 ) and ((x+h), (x+h) 3 ) 
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( x + h) 3 -x 3 
x+h-x ’ 

which works out as 3x 2 + 3xh + h 2 . As h “tends to zero,” 
this gradient “tends to 3x 2 ,’’ so we say that the gradient 
at x is 3x 2 . But what if we wanted to be a bit more care- 
ful? For instance, if x is very large, are we really justihed 
in ignoring the term 3 xh? 

To reassure ourselves on this point, we do a small cal- 
culation to show that, whatever x is, the error 3 xh + h 2 
can be made arbitrarily small, provided only that h is 
sufficiently small. Here is one way of going about it. Sup- 
pose we fix a small positive number e, which represents 
the error we are prepared to tolerate. Then if | h\ ^ e/6x, 
we know that |3xh| is at most e/2. If in addition we 
know that \h\ 4 je/2, then we also know that h 2 < e/2. 
So, provided that \h\ is smaller than the minimum of 
the two numbers e/6x and Ve/2, the difference between 
3x 2 + 3 xh + h 2 and 3x 2 will be at most e. 

There are two features of the above argument that 
are typical of analysis. First, although the statement we 
wished to prove was about a limiting process, and was 
therefore “infmitary,” the actual work that we needed to 
do to prove it was entirely finite. Second, the nature of 
that work was to find sufficient conditions for a certain 
fairly simple inequality (the inequality 1 3 xh + h 2 < e) 
to be true. 

Let us illustrate this second feature with another 
example: a proof that x 4 - x 2 - 6x + 10 is positive for 
every real number x. Here is an “analyst’s argument.” 
Note first that if x ^ - 1 then x 4 > x 2 and 10 - 6x ^ 0, 
so the result is certainly true in this case. If -1 < x < 1, 
then | x 4 - x 2 - 6x | cannot be greater than x 4 + x 2 + 6 1 x | , 
which is at most 8, so x 4 - x 2 - 6x ^ -8, which implies 
thatx 4 -x 2 -6x+10 ^ 2. If 1 < x ^ f,thenx 4 ^ x 2 and 
6x < 9, so x 4 - x 2 - 6x + 10 ^ 1. If | ^ x < 2, then 
x 2 Js \ ^ 2, so x 4 - x 2 = x 2 (x 2 — 1) ^ 2. Also, 6x < 12, 
so 10 - 6x ^ -2. Therefore, x 4 - x 2 - 6x + 10 ^ 0. 
Finally, if x ^ 2, then x 4 - x 2 = x 2 (x 2 - 1) ^ 3x 2 ^ 6x, 
from which it follows that x 4 - x 2 - 6x + 10 ^ 10. 

The above argument is somewhat long, but each step 
consists in proving a rather simple inequality— this is 
the sense in which the proof is typical of analysis. Here, 
for contrast, is an “algebraist's proof.” One simply points 
out that x 4 - x 2 - 6x+ 10 is equal to (x 2 - 1) 2 + (x - 3) 2 , 
and is therefore always positive. 

This may make it seem as though, given the choice 
between analysis and algebra, one should go for alge- 
bra. After all, the algebraic proof was much shorter, and 
makes it obvious that the function is always positive. 


However, although there were several steps to the ana- 
lyst’s proof, they were all easy, and the brevity of the 
algebraic proof is misleading since no due has been 
given about how the equivalent expression for x 4 - x 2 - 
6x + 10 was found. And in faet, the general question of 
when a polynomial can be written as a sum of squares of 
other polynomials turns out to be an interesting and dif- 
ficult one (particularly when the polynomials have more 
than one variable). 

There is also a third, hybrid approach to the problem, 
which is to use calculus to find the points where x 4 -x 2 - 
6x + 10 is minimized. The idea would be to calculate the 
derivative 4x 3 - 2x - 6 (an algebraic process, justihed by 
an analytic argument), find its roots (algebra), and check 
that the values of x 4 - x 2 - 6x + 10 at the roots of the 
derivative are positive. However, though the method is 
a good one for many problems, in this case it is tricky 
because the cubic 4x 3 - 2x - 6 does not have integer 
roots. But one could use an analytic argument to find 
small intervals inside which the minimum must occur, 
and that would then reduce the number of cases that had 
to be considered in the hrst, purely analytic, argument. 

As this example suggests, although analysis often 
involves limiting processes and algebra usually does not, 
a more signiheant distinetion is that algebraists like to 
work with exact formulas and analysts use estimates. Or, 
to put it even more succinctly, algebraists like equalities 
and analysts like inequalities. 

2 The Main Branches of Mathematics 

Now that we have discussed the differences between 
algebraic, geometrical, and analytical thinking, we are 
ready for a erude classification of the subject matter of 
mathematics. We face a potential confusion, because the 
words “algebra,” “geometry,” and “analysis” refer both to 
specific branches of mathematics and to ways of think- 
ing that cut across many different branches. Thus, it 
makes sense to say (and it is true) that some branches 
of analysis are more algebraic (or geometrical) than oth- 
ers; similarly, there is no paradox in the faet that alge- 
braic topology is almost entirely algebraic and geometri- 
cal in character, even though the objects it studies, topo- 
logical spaces, are part of analysis. In this section, we 
shall think primarily in terms of subject matter, but it 
is important to keep in mind the distinetions of the pre- 
vious section and be aware that they are in some ways 
more fundamental. Our descriptions will be very brief: 
further reading about the main branches of mathemat- 
ics can be found in parts II and IV, and more specific 
points are discussed in parts IB and V. 
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2.1 Algebra 

The word “algebra,” when it denotes a branch of math- 
ematics, means something more specific than manipu- 
lation of symbols and a preference for equalities over 
inequalities. Algebraists are concerned with number sys- 
tems, polynomials, and more abstract structures such 
as groups, helds, vector spaces, and rings (discussed 
in some detail in some fundamental mathematical 
definitions [1.3]). Historically, the abstract structures 
emerged as generalizations from concrete distances. For 
distance, there are important analogies between the set 
of all integers and the set of all polynomials with rational 
(for example) coefhcients, which are brought out by the 
faet that they are both examples of algebraic struc- 
tures known as Euclidean domains. If one has a good 
understanding of Euclidean domains, one can apply this 
understanding to integers and polynomials. 

This highlights a contrast that appears in many 
branches of mathematics, namely the distinetion 
between general, abstract statements and particular, 
concrete ones. One algebraist might be thinking about 
groups, say, in order to understand a particular rather 
complicated group of symmetries, while another might 
be interested in the general theory of groups on the 
grounds that they are a fundamental class of mathemat- 
ical objects. The development of abstract algebra from 
its concrete beginnings is discussed in the origins of 
MODERN ALGEBRA [H.3]. 

A supreme example of a theorem of the hrst kind 
is THE INSOLUBILITY OF THE QUINTIC [V.24]— the result 
that there is no formula for the roots of a quintic poly- 
nomial in terms of its coefhcients. One proves this theo- 
rem by analyzing symmetries associated with the roots 
of a polynomial, and understanding the group that is 
formed by them. This concrete example of a group (or 
rather, class of groups, one for each polynomial) played 
a very important part in the development of the abstract 
theory of groups. 

As for the second kind of theorem, a good example 

is THE CLASSIFICATION OF FINITE SIMPLE GROUPS [V.8], 

which describes the basic building blocks out of which 
any Hnite group can be built. 

Algebraic structures appear throughout mathematics, 
and there are many applications of algebra to other 
areas, such as number theory, geometry, and even math- 
ematical physics. 

2.2 Number Theory 

Number theory is largely concerned with properties of 
the set of positive integers, and as such has a consid- 


erable overlap with algebra. But a simple example that 
illustrates the difference between a typical question in 
algebra and a typical question in number theory is pro- 
vided by the equation 13x - 7 y = 1. An algebraist 
would simply note that there is a one-parameter fam- 
ily of solutions: if y = A then x = (1 + 7A)/13, so the 
general solution is (x,y) = ((1 + 7A)/13,A). A num- 
ber theorist would be interested in integer solutions, 
and would therefore work out for which integers A the 
number 1 + 7A is a multiple of 13. (The answer is that 
1 + 7A is a multiple of 13 if and only if A has the form 
13m + 11 for some integer m.) Other topics studied by 
number theorists are properties of special numbers such 
as primes. 

However, this description does not do full justice to 
modern number theory, which has developed into a 
highly sophisticated subject. Most number theorists are 
not direetly trying to solve equations in integers; instead 
they are trying to understand structures that were origi- 
nally developed to study such equations but which then 
took on a life of their own and became objects of study 
in their own right. In some cases, this process has hap- 
pened several times, so the phrase “number theory” 
gives a very misleading picture of what some number 
theorists do. Nevertheless, even the most abstract parts 
of the subject can have down-to-earth applications: a 
notable example is Andrew Wiles’s famous proof of 
fermat’s last theorem [V. 12]. 

Interestingly, in view of the discussion earlier, num- 
ber theory has two fairly distinet subbranches, known 
as ALGEBRAIC NUMBER THEORY [IV.3] and ANALYTIC 
number theory [FV.4]. As a rough rule of thumb, the 
study of equations in integers leads to algebraic number 
theory and the study of prime numbers leads to analytic 
number theory, but the true picture is of course more 
complicated. 

2.3 Geometry 

A central object of study is the manifold, which is dis- 
cussed in [1.3 §6.9]. Manifolds are higher-dimensional 
generalizations of shapes like the surface of a sphere, 
which have the property that any small portion of them 
looks fairly flat but the whole may be curved in compli- 
cated ways. Most people who call themselves geometers 
are studying manifolds in one way or another. As with 
algebra, some will be interested in particular manifolds 
and others in the more general theory. 

Within the study of manifolds, one can attempt a fur- 
ther classification, according to when two manifolds are 
regarded as “genuinely distinet.” A topologist regards 
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two objects as the same if one can be continuously 
deformed, or “morphed,” into the other; thus, for exam- 
ple, an apple and a pear would count as the same for 
a topologist. This means that relative distances are not 
important to topologists, since one can change them by 
suitable continuous stretches. A differential topologist 
asks for the deformations to be “smooth" (which means 
“sufficiently differentiable"). This results in a finer classi- 
fication of manifolds and a different set of problems. At 
the other, more “geometrical,” end of the Spectrum are 
mathematicians who are much more concerned by the 
precise nature of the distances between points on a man- 
ifold (a concept that would not make sense to a topolo- 
gist) and in auxiliary structures that one can associate 
with a manifold. See riemannian metrics [1.3 §6.10] 
and ricci flow [III.80] for some indication of what the 
more geometrical side of geometry is like. 

2.4 Algebraic Geometry 

As its name suggests, algebraic geometry does not have 
an obvious place in the above classification, so it is eas- 
ier to discuss it separately. Algebraic geometers also 
study manifolds, but with the important difference that 
their manifolds are defined using polynomials. (A simple 
example of this is the surface of a sphere, which can be 
defined as the set of all (x,y,z) such that x 2 + y 2 + z 2 = 
1.) This means that algebraic geometry is algebraic in the 
sense that it is “all about polynomials” but geometric in 
the sense that the set of solutions of a polynomial in 
several variables is a geometric object. 

An important part of algebraic geometry is the study 
of singularities. Often the set of solutions to a system of 
polynomial equations is similar to a manifold, but has a 
few exceptional, singular points. For example, the equa- 
tion x 2 = y 2 + z 2 defines a (double) cone, which has its 
vertex at the origin (0, 0, 0). If you look at a small enough 
neighborhood of a point x on the cone, then, provided 
x is not (0, 0, 0), the neighborhood will resemble a flat 
plane. However, if x is (0, 0, 0), then no matter how small 
the neighborhood is, you will still see the vertex of the 
cone. Thus, (0, 0, 0) is a singularity. (This means that the 
cone is not actually a manifold, but a “manifold with a 
singularity.”) 

The interplay between algebra and geometry is part 
of what gives algebraic geometry its fascination. A fur- 
ther impetus to the subject comes from its connections 
to other branches of mathematics. There is a particu- 
larly close connection with number theory, explained in 
arithmetic geometry [IV.6]. More surprisingly, there 
are important connections between algebraic geom- 


etry and mathematical physics. See mirror symmetry 
[IV. 14] for an account of some of these. 

2.5 Analysis 

Analysis comes in many different flavors. A major 
topic is the study of partial differential equations 
[IV.16]. This began because partial differential equa- 
tions were found to govern many physical processes, 
such as motion in a gravitational held, for example. But 
they arise in purely mathematical contexts as well— 
particularly in geometry— so partial differential equa- 
tions give rise to a big branch of mathematics with many 
subbranches and links to many other areas. 

Like algebra, analysis has some abstract structures 
that are central objects of study, such as banach spaces 
[III.64], HILBERT SPACES [DI. 3 7], C* -ALGEBRAS [IV.19 §3], 
and von neumann algebras [IV.19 §2]. These are all 
infinite-dimensional vector spaces [1.3 §2.3], and the 
last two are “algebras,” which means that one can multi- 
ply their elements together as well as adding them and 
multiplying them by scalars. Because these structures 
are infinite dimensional, studying them involves limit- 
ing arguments, which is why they belong to analysis. 
However, the extra algebraic structure of C* -algebras 
and von Neumann algebras means that in those areas 
substantial use is made of algebraic tools as well. And 
as the word “space” suggests, geometry also has a very 
important role. 

Dynamics [IV. 15] is another significant branch of 
analysis. It is concerned with what happens when you 
take a simple process and do it over and over again. 
For example, if you take a complex number zo, then let 
Zi = z o + 2, and then let Z2 = z 2 + 2, and so on, then what 
is the limiting behavior of the sequence zo,zi,Z2,...? 
Does it head off to infmity or stay in some bounded 
region? The answer turns out to depend in a compli- 
cated way on the original number zo- The study of how 
it depends on zo is a question in dynamics. 

Sometimes the process to be repeated is an “infinites- 
imal” one. For example, if you are told the positions, 
velocities, and masses of all the planets in the solar sys- 
tem at a particular moment (as well as the mass of the 
Sun), then there is a simple rule that tells you how the 
positions and velocities will be different an instant later. 
Later, the positions and velocities have changed, so the 
calculation changes; but the basic rule is the same, so 
one can regard the whole process as applying the same 
simple infinitesimal process infinitely many times. The 
correct way to formulate this is by means of partial dif- 
ferential equations and therefore much of dynamics is 
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concerned with the long-term behavior of solutions to 
these. 

2.6 Logic 

The word “logic” is sometimes used as a shorthand 
for all branches of mathematics that are concerned 
with fundamental questions about mathematics itself, 
notably set theory [IV. 1], category theory [m.8], 
model theory [IV.2], and logic in the narrower sense of 
“rules of deduction.” Among the triumphs of set theory 
are godel’s incompleteness theorems [V.18]and Paul 
Cohen’s proof of the independence of the contin- 
uum hypothesis [V.21]. Godel’s theorems in particular 
had a dramatic effect on philosophical perceptions of 
mathematics, though now that it is understood that not 
every mathematical statement has a proof or disproof 
most mathematicians carry on much as before, since 
most statements they encounter do tend to be decid- 
able. However, set theorists are a different breed. Since 
Godel and Cohen, many further statements have been 
shown to be undecidable, and many new axioms have 
been proposed that would make them decidable. Thus, 
decidability is now studied for mathematical rather than 
philosophical reasons. 

Category theory is another subject that began as 
a study of the processes of mathematics and then 
became a mathematical subject in its own right. It differs 
from set theory in that its focus is less on mathemati- 
cal objects themselves than on what is done to those 
objects — in particular, the maps that transform one to 
another. 

A model for a collection of axioms is a mathematical 
structure for which those axioms, suitably interpreted, 
are true. For example, any concrete example of a group 
is a model for the axioms of group theory. Set theo- 
rists study models of set-theoretic axioms, and these 
are essential to the proofs of the famous theorems men- 
tioned above, but the notion of model is more widely 
applicable and has led to important discoveries in helds 
well outside set theory. 

2.7 Combinatorics 

There are various ways in which one can try to dehne 
combinatorics. None is satisfactory on its own, but 
together they give some idea of what the subject is like. 
A hrst definition is that combinatorics is about counting 
things. For example, how many ways are there of Hlling 
an nxn square grid with Os and ls if you are allowed at 
most two ls in each row and at most two ls in each col- 


umn? Because this problem asks us to count something, 
it is, in a rather simple sense, combinatorial. 

Combinatorics is sometimes called “discrete math- 
ematics” because it is concerned with “discrete” as 
opposed to “continuous” structures. Roughly speaking, 
an object is discrete if it consists of points that are 
isolated from each other and continuous if you can 
move from one point to another without making sud- 
den jumps. (A good example of a discrete structure is 
the integer lattice Z 2 , which is the grid consisting of 
all points in the plane with integer coordinates, and a 
good example of a continuous one is the surface of a 
sphere.) There is a close afhnity between combinatorics 
and theoretical computer science (which deals with the 
quintessentially discrete structure of sequences of Os 
and 1 s), and combinatorics is sometimes contrasted with 
analysis, though in faet there are several connections 
between the two. 

A third definition is that combinatorics is con- 
cerned with mathematical structures that have “few con- 
straints.” This idea helps to explain why number theory, 
despite the faet that it studies (among other things) 
the distinctly discrete set of all positive integers, is not 
considered a branch of combinatorics. 

In order to illustrate this last contrast, here are two 
somewhat similar problems, both about positive inte- 
gers. 

(i) Is there a positive integer that can be written in a 
thousand different ways as a sum of two squares? 

(ii) Let «i , « 2 , « 3 , ■ ■ ■ be a sequence of positive integers, 
and suppose that each a n lies between n 2 and (n + 
l) 2 . Will there always be a positive integer that can 
be written in a thousand different ways as a sum of 
two numbers from the sequence? 

The first question counts as number theory, since it 
concerns a very specifk sequence— the sequence of 
squares— and one would expect to use properties of this 
special set of numbers in order to determine the answer, 
which turns out to be yes. 1 

The second question concerns a far less structured 
sequence. All we know about a n is its rough size— it is 
f airly close to n 2 — but we know nothing about its more 
detailed properties, such as whether it is a prime, or a 


1. Here is a quick hint at a proof. At the beginning of analytic 
number THEORY [IVA] you will find a condition that telts you pre- 
cisely which numbers can be written as sums of two squares. From 
this criterion it follows that “most” numbers cannot. A careful count 
shows that if JV is a large integer, then there are many more expres- 
sions of the form m 2 + n 2 with both m 2 and n 2 less than N than there 
are numbers less than 2N that can be written as a sum of two squares. 
Therefore there is a lot of duplication. 
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perfect cube, or a power of 2, etc. For this reason, the 
second problem belongs to combinatorics. The answer 
is not known. If the answer turns out to be yes, then it 
will show that, in a sense, the number theory in the first 
problem was an illusion and that all that really mattered 
was the roughrate of growth of the sequence of squares. 

2.8 Theoretical Computer Science 

This branch of mathematics is described at considerable 
length in part IV, so we shall be brief here. Broadly speak- 
ing, theoretical computer science is concerned with effi- 
ciency of computation, meaning the amounts of various 
resources, such as time and computer memory, needed 
to perform given computational tasks. There are math- 
ematical models of computation that allow one to study 
questions about computational efficiency in great gen- 
erality without having to worry about precise details 
of how algorithms are implemented. Thus, theoretical 
computer science is a genuine branch of pure mathe- 
matics: in theory, one could be an excellent theoretical 
computer scientist and be unable to program a com- 
puter. However, it has had many notable applications as 
well, especially to cryptography (see mathematics and 
CRYptography [VII. 7] for more on this). 

2.9 Probability 

There are many phenomena, from biology and eco- 
nomics to computer science and physics, that are so 
complicated that instead of trying to understand them 
in complete detail one tries to make probabilistic state- 
ments instead. For example, if you wish to analyze how 
a disease is likely to spread, you cannot hope to take 
account of all the relevant information (such as who will 
come into contact with whom) but you can build a math- 
ematical model and analyze it. Such models can have 
unexpectedly interesting behavior with direct practical 
relevance. For example, it may happen that there is a 
“critical probability” p with the following property: if the 
probability of infection after contact of a certain kind is 
above p then an epidemic may very well result, whereas 
if it is below p then the disease will almost certainly 
die out. A dramatic difference in behavior like this is 
called a phase transition. (See probabilistic models of 
critical phenomena [IV.26] for further discussion.) 

Setting up an appropriate mathematical model can be 
surprisingly difficult. For example, there are physical cir- 
cumstances where particles travel in what appears to be 
a completely random manner. Can one make sense of 
the notion of a random continuous path? It turns out 


that one can— the result is the elegant theory of brown- 
ian motion [IV.25] — but the proof that one can is highly 
sophisticated, roughly speaking because the set of all 
possible paths is so complex. 

2.10 Mathematical Physics 

The relationship between mathematics and physics has 
changed profoundly over the centuries. Up to the eigh- 
teenth century there was no sharp distinction drawn 
between mathematics and physics, and many famous 
mathematicians could also be regarded as physicists, 
at least some of the time. During the nineteenth cen- 
tury and the beginning of the twentieth century this 
situation gradually changed, until by the middle of the 
twentieth century the two disciplines were very sepa- 
rate. And then, toward the end of the twentieth cen- 
tury, mathematicians started to find that ideas that had 
been discovered by physicists had huge mathematical 
significance. 

There is still a big cultural difference between the two 
subjects: mathematicians are far more interested in Ånd- 
ing rigorous proofs, whereas physicists, who use math- 
ematics as a tool, are usually happy with a convbicing 
argument for the truth of a mathematical statement, 
even if that argument is not actually a proof. The result 
is that physicists, operating under less stringent con- 
straints, often discover fascinating mathematical phe- 
nomena long before mathematicians do. 

Finding rigorous proofs to backup these discoveries is 
often extremely hard: it is far more than a pedantic exer- 
cise in certifying the truth of statements that no physi- 
cist seriously doubted. Indeed, it often leads to further 
mathematical discoveries. The articles vertex opera- 
tor ALGEBRAS [IV.13], MIRROR SYMMETRY [IV.14], GEN- 
ERAL RELATIVITY AND THE EINSTEIN EQUATIONS [IV. 17], 

and operator algebras [IV. 19] describe some fasci- 
nating examples of how mathematics and physics have 
enriched each other. 


1.2 The Language and Grammar of 
Mathematics 


1 Introduction 

It is a remarkable phenomenon that children can learn 
to speak without ever being consciously aware of the 
sophisticated grammar they are using. Indeed, adults 
too can live a perfectly satisfactory Ufe without ever 
thinking about ideas such as parts of speech, subjects, 
predicates, or subordinate clauses. Both children and 
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adults can easily recognize ungrammatical sentences, 
at least if the mistake is not too subtle, and to do this 
it is not necessary to be able to explain the rules that 
have been violated. Nevertheless, there is no doubt that 
one's understanding of language is hugely enhanced by 
a knowledge of basic grammar, and this understanding 
is essential for anybody who wants to do more with 
language than use it unreflectingly as a means to a 
nonlinguistic end. 

The same is true of mathematical language. Up to a 
point, one can do and speak mathematics without know- 
ing how to classify the different sorts of words one is 
using, but many of the sentences of advanced mathemat- 
ics have a complicated structure that is much easier to 
understand if one knows a few basic terms of mathemat- 
ical grammar. The object of this section is to explain the 
most important mathematical “parts of speech,” some 
of which are similar to those of natural languages and 
others quite different. These are normally taught right 
at the beginning of a unlversity course in mathematics. 
Much of The Companion can be understood without a 
precise knowledge of mathematical grammar, but a care- 
ful reading of this article will help the reader who wishes 
to follow some of the later, more advanced parts of the 

The main reason for using mathematical grammar is 
that the statements of mathematics are supposed to be 
completely precise, and it is not possible to achieve com- 
plete precision unless the language one uses is free of 
many of the vaguenesses and ambiguities of ordinary 
speech. Mathematical sentences can also be highly com- 
plex: if the parts that made them up were not clear and 
simple, then the unclarities would rapidly accumulate 
and render the sentences unintelligible. 

To illustrate the sort of clarity and simplicity that is 
needed in mathematical discourse, let us consider the 
famous mathematical sentence “Two plus two equals 
four” as a sentence of English rather than of mathemat- 
ics, and try to analyze it grammatically. On the face of it, 
it contains three nouns (“two,” “two,” and “four”), a verb 
(“equals”) and a conjunction (“plus”). However, looking 
more carefully we may begin to notice some oddities. 
For example, although the word “plus” resembles the 
word “and,” the most obvious example of a conjunction, 
it does not behave in quite the same way, as is shown 
by the sentence “Mary and Peter love Paris.” The verb in 
this sentence, “love,” is plural, whereas the verb in the 
previous sentence, “equals,” was singular. So the word 
“plus" seems to take two objects (which happen to be 
numbers) and produce out of them a new, single object, 


while “and” conjoins “Mary” and “Peter” in a looser way, 
leaving them as distinct people. 

Reflecting on the word “and” a bit more, one Ands that 
it has two very different uses. One, as above, is to link 
two nouns, whereas the other is to join two whole sen- 
tences together, as in “Mary likes Paris and Peter likes 
New York.” If we want the basics of our language to be 
absolutely clear, then it will be important to be aware 
of this distinction. (When mathematicians are at their 
most formal, they simply outlaw the noun-linking use 
of “and”— a sentence such as “3 and 5 are prime num- 
bers” is then paraphrased as “3 is a prime number and 
5 is a prime number.”) 

This is but one of many similar questions: anybody 
who has tried to classify all words into the standard 
eight parts of speech will know that the classification is 
hopelessly inadequate. What, for example, is the role of 
the word “six” in the sentence “This section has six sub- 
sections”? Unlike “two” and “four" earlier, it is certainly 
not a noun. Since it modifies the noun “subsection” it 
would traditionally be classified as an adjective, but it 
does not behave like most adjectives: the sentences “My 
car is not very fast” and “Look at that tall budding” are 
perfectly grammatical, whereas the sentences “My car 
is not very six” and “Look at that six budding” are not 
just nonsense but ungrammatical nonsense. So do we 
classify adjectives further into numerical adjectives and 
nonnumerical adjectives? Perhaps we do, but then our 
troubles wid be only just beginning. For example, what 
about possessive adjectives such as “my” and “your"? In 
general, the more one tries to refine the classification of 
English words, the more one realizes how many different 
grammatical roles there are. 

2 Four Basic Concepts 

Another word that famously has three quite distinct 
meanings is “is.” The three meanings are dlustrated in 
the following three sentences. 

(1) 5 is the square root of 25. 

(2) 5 is less than 10. 

(3) 5 is a prime number. 

In the first of these sentences, “is” could be replaced 
by “equals”: it says that two objects, 5 and the square 
root of 25, are in faet one and the same object, just as 
it does in the English sentence “London is the Capital of 
the United Kingdom.” In the second sentence, “is” plays a 
completely different role. The words “less than 10” form 
an adjectival phrase, specifying a property that numbers 
may or may not have, and “is” in this sentence is Uke “is” 



1.2. The Language and Grammar of Mathematics 


in the English sentence “Grass is green.” As for the third 
sentence, the word “is” there means “is an example of,” 
as it does in the English sentence “Mercury is a planet.” 

These differences are reflected in the faet that the sen- 
tences cease to resemble each other when they are writ- 
ten in a more symbolic way. An obvious way to write 
(1) is 5 = V25. As for (2), it would usually be written 
5 < 10, where the symbol “<” means “is less than.” The 
third sentence would normally not be written symboli- 
cally because the concept of a prime number is not quite 
basic enough to have universally recognized symbols 
associated with it. However, it is sometimes useful to 
do so, and then one must invent a suitable symbol. One 
way to do it would be to adopt the convention that if n 
is a positive integer, then P(n) stands for the sentence 
“n is prime.” Another way, which does not hide the word 
“is,” is to use the language of sets. 

2.1 Sets 

Broadly speaking, a set is a collection of objects, and in 
mathema tical discourse these objects are mathema tical 
ones such as numbers, points in space, or even other 
sets. If we wish to rewrite sentence (3) symbolically, 
another way to do it is to define P to be the collection, 
or set, of all prime numbers. Then (3) can be rewritten, 
“5 belongs to the set P." This notion of belonging to a set 
is sufficiently basic to deserve its own symbol, and the 
symbol used is “g.” So a fully symbolic way of writing 
the sentence is 5 g P. 

The members of a set are usually called its elements, 
and the symbol “g” is usually read “is an element of.” 
So the “is” of sentence (3) is more like “g” than “=.” 
Although one cannot direetly substitute the phrase “is 
an element of” for “is,” one can do so if one is prepared 
to modify the rest of the sentence a little. 

There are three common ways to denote a specihc 
set. One is to list its elements inside curly brackets: 
{2, 3, 5, 7, 11, 13, 17, 19}, for example, is the set whose 
elements are the eight numbers 2, 3, 5, 7, 11, 13, 17, 
and 19. The majority of sets considered by mathemati- 
cians are too large for this to be feasible— indeed, they 
are often infinite — so a second way to denote sets is 
to use dots to imply a list that is too long to write 
down: for example, the expressions {1, 2, 3, ... , 100} and 
{2, 4, 6, 8, ... } canbe used to represent the set of all pos- 
itive integers up to 100 and the set of all positive even 
numbers, respectively. A third way, and the way that 
is most important, is to define a set via a property. an 
example that shows how this is done is the expression 
{x : x is prime and x < 20}. To read an expression such 


as this, one first reads the opening curly bracket as “The 
set of.” Next, one reads the symbol that occurs before 
the colon. The colon itself one reads as “such that.” 
Finally, one reads what comes after the colon, which is 
the property that determines the elements of the set. In 
this instance, we end up saying, “The set of x such that 
x is prime and x is less than 20,” which is in faet equal 
to the set {2, 3, 5, 7, 11, 13, 17, 19} considered earlier. 

Many sentences of mathematics can be rewritten in 
set-theoretic terms. For example, sentence (2) earlier 
could be written as 5 g {n : n < 10}. Often there is 
no point in doing this (as here, where it is mueh eas- 
ier to write 5 < 10) but there are circumstances where 
it becomes extremely convenient. For example, one of 
the great advances in mathematics was the use of Carte- 
sian coordinates to translate geometry into algebra and 
the way this was done was to define geometrical objects 
as sets of points, where points were themselves defined 
as pairs or triples of numbers. So, for example, the 
set {( x,y ) : x 2 + y 2 = 1} is (or represents) a circle 
of radius 1 with its center at the origin (0,0). That is 
because, by the Pythagorean theorem, the distance from 
(0, 0) to (x,y) is Vx 2 + y 2 , so the sentence “x 2 + y 2 = 
1” can be reexpressed geometrically as “the distance 
from (0, 0) to (x, y) is 1.” If all we ever cared about was 
which points were in the circle, then we could make do 
with sentences such as “x 2 + y 2 = 1,” but in geometry 
one often wants to consider the entire circle as a single 
object (rather than as a multiplicity of points, or as a 
property that points might have), and then set-theoretic 
language is indispensable. 

A second circumstance where it is usually hard to do 
without sets is when one is defining new mathematical 
objects. Very often such an object is a set together with 
a mathematical structure imposed on it, which takes 
the form of certain relationships among the elements 
of the set. For examples of this use of set-theoretic lan- 
guage, see sections 1 and 2, on number systems and alge- 
braic structures, respectively, in some fundamental 
MATHEMATICAL DEFINITIONS [1.3]. 

Sets are also very useful if one is trying to do meta- 
mathematics, that is, to prove statements not about 
mathematical objects but about the process of mathe- 
matical reasoning itself. For this it helps a lot if one can 
devise a very simple language— with a small vocabulary 
and an uncomplicated grammar— into which it is in prin- 
ciple possible to translate all mathematical arguments. 
Sets allow one to reduce greatly the number of parts of 
speech that one needs, turning almost all of them into 
nouns. For example, with the help of the membership 
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symbol “g” one can do without adjectives, as the trans- 
lation of “5 is a prime number” (where “prime” functions 
as an adjective) into “5 g P" has already suggested. 1 
This is of course an artificial process— imagine replac- 
ing “roses are red” by “roses belong to the set R”— but 
in this context it is not important for the formal language 
to be natural and easy to understand. 

2.2 Functions 

Let us now switch attention from the word “is” to some 
other parts of the sentences (l)-(3), focusing first on 
the phrase “the square root of” in sentence (1). If we 
wish to think about this phrase grammatically, then we 
should analyze what sort of role it plays in a sentence, 
and the analysis is simple: in virtually any mathemati- 
cal sentence where the phrase appears, it is followed by 
the name of a number. If the number is n, then this pro- 
duces the slightly longer phrase, “the square root of n,” 
which is a noun phrase that denotes a number and plays 
the same grammatical role as a number (at least when 
the number is used as a noun rather than as an adjec- 
tive). For distance, replacing “5” by “the square root of 
25” in the sentence “5 is less than 7” yields a new sen- 
tence, “The square root of 25 is less than 7,” that is still 
grammatically correct (and true). 

One of the most basic activities of mathematics is to 
take a mathematical object and transform it into another 
one, sometimes of the same kind and sometimes not. 
“The square root of” transforms numbers into numbers, 
as do “four plus,” “two times,” “the cosine of,” and “the 
logarithm of." A nonnumerical example is “the center of 
gravity of,” which transforms geometrical shapes (pro- 
vided they are not too exotic or complicated to have a 
center of gravity) into points— meaning that if S stands 
for a shape, then “the center of gravity of S" stands for 
a point. A function is, roughly speaking, a mathematical 
transformation of this kind. 

It is not easy to make this definition more precise. To 
ask, “What is a function?” is to suggest that the answer 
should be a thing of some sort, but functions seem to 
be more like processes. Moreover, when they appear in 
mathematical sentences they do not behave Uke nouns. 
(They are more like prepositions, though with a defi- 
nite difference that will be discussed in the next subsec- 
tion.) One might therefore think it inappropriate to ask 
what kind of object “the square root of” is. Should one 
not simply be satisfied with the grammatical analysis 
already given? 


1. For another discussion of adjectives see arithmetic geometry 
[TV.6 §3.1]. 


As it happens, no. Over and over again, throughout 
mathematics, it is useful to think of a mathematical phe- 
nomenon, which may be complex and very un-thinglike, 
as a single object. We have already seen a simple exam- 
ple: a collection of infinitely many points in the plane 
or space is sometimes better thought of as a single geo- 
metrical shape. Why should one wish to do this for func- 
tions? Here are two reasons. First, it is convenient to be 
able to say something Uke, “The derivative of sin is cos,” 
or to speak in general terms about some functions being 
differentiable and others not. More generally, functions 
can have properties, and in order to discuss those prop- 
erties one needs to think of functions as things. Second, 
many algebraic structures are most naturaUy thought of 
as sets of functions. (See, for example, the discussion 
of groups and symmetry in [1.3 §2.1]. See also hilbert 
SPACES [III.37], FUNCTION SPACES [III.29], and VECTOR 
SPACES [1.3 §2.3].) 

If / is a function, then the notation f(x) = y means 
that / turns the object x into the object y. Once one 
starts to speak formaUy about functions, it becomes 
important to specify exactly which objects are to be sub- 
jected to the transformation in question, and what sort 
of objects they canbe transformed into. One of the main 
reasons for this is that it makes it possible to discuss 
another notion that is central to mathematics, that of 
inverting a function. (See [1.4 §1] for a discussion of why 
it is central.) Roughly speaking, the inverse of a function 
is another function that undoes it, and that it undoes; for 
example, the function that takes a number n to n - 4 is 
the inverse of the function that takes n to n + 4, since if 
you add four and then subtract four, or vice versa, you 
get the number you started with. 

Here is a function / that cannot be inverted. It takes 
each number and replaces it by the nearest multiple 
of 100, rounding up if the number ends in 50. Thus, 
/(113) = 100, /(3879) = 3900, and /(1050) = 1100. 
It is clear that there is no way of undoing this process 
with a function g. For example, in order to undo the 
effect of / on the number 113 we would need g(l 00) 
to equal 113. But the same argument applies to every 
number that is at least as big as 50 and smaller than 
150, and g ( 1 00 ) cannot be more than one number at 
once. 

Now let us consider the function that doubles a num- 
ber. Can this be inverted? Yes it can, one might say: just 
divide the number by two again. And much of the time 
this would be a perfectly sensible response, but not, for 
example, if it was clear from the context that the num- 
bers being talked about were positive integers. Then one 
might be focusing on the difference between even and 
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odd numbers, and this difference could be encapsulated 
by saying that odd numbers are precisely those numbers 
n for which the equation 2 x = n does not have a solu- 
tion. (Notice that one can undo the doubling process by 
halving. The problem here is that the relationship is not 
symmetrical: there is no function that can be undone 
by doubling, since you could never get back to an odd 
number.) 

To specify a function, therefore, one must be careful 
to specify two sets as well: the domam, which is the set 
of objects to be transformed, and the range, which is the 
set of objects they are allowed to be transformed into. A 
function / from a set A to a set B is a rule that specifies, 
for each element x of A, an element y = fix) of B. (Not 
every element of the range needs to be used: consider 
once again the example of “two times” when the domain 
and range are both the set of all positive integers.) 

The following symbolic notation is used. The expres- 
sion / : A — B means that / is a function with domain 
A and range B. If we then write fix) = y, we know that 
x must be an element of A and y must be an element 
of B. Another way of writing /(x) = y that is sometimes 
more convenient is f : x ~ y. (The bar on the arrow is 
to distinguish it from the arrow in / : A — B, which has 
a very different meaning.) 

If we want to undo the effect of a function / : A — B, 
then we can, as long as we avoid the problem that 
occurred with the approximating function discussed 
earlier. That is, we can do it if f(x) and f(x') are dif- 
ferent whenever x and x' are different elements of A. If 
this condition holds, then / is called an injection. On the 
other hånd, if we want to find a function g that is undone 
by /, then we can do so as long as we avoid the problem 
of the integer-doubling function. That is, we can do it if 
every element y of B is equal to / (x ) f or some element x 
of A (so that we have the option of setting g ly) = x). If 
this condition holds, then / is called a surjection. If / 
is both an injection and a surjection, then / is called a 
bijection. Bijections are precisely the functions that have 
inverses. 

It is important to realize that not all functions have 
tidy definitions. Here, for example, is the specification 
of a function from the positive integers to the positive 
integers: fin) = n if n is a prime number, /(n) = fe if 
n is of the form 2 fc for an integer fc greater than 1, and 
f(n) = 13 for all other positive integers n. This function 
has an unpleasant, arbitrary definition but it is neverthe- 
less a perfectly legitimate function. Indeed, “most” func- 
tions, though not most functions that one actually uses, 
are so arbitrary that they cannot be defined. (Such func- 
tions may not be useful as individual objects, but they 


are needed so that the set of all functions from one set 
to another has an interesting mathematical structure.) 

2.3 Relations 

Let us now think about the grammar of the phrase “less 
than” in sentence (2). As with “the square root of,” it 
must always be followed by a mathematical object (in 
this case a number again). Once we have done this we 
obtain a phrase such as “less than n,” which is impor- 
tantly different from “the square root of n” because it 
behaves like an adjective rather than a noun, and refers 
to a property rather than an object. This is just how 
prepositions behave in English: look, for example, at 
the word “under” in the sentence “The cat is under the 
table.” 

At a slightly higher level of formality, mathematicians 
like to avoid too many parts of speech, as we have 
already seen for adjectives. So there is no symbol for 
“less than”: instead, it is combined with the previous 
word “is” to make the phrase “is less than,” which is 
denoted by the symbol “<.” The grammatical rules for 
this symbol are once again simple. To use “<” in a sen- 
tence, one should precede it by a noun and follow it 
by a noun. For the resulting grammatically correct sen- 
tence to make sense, the nouns should refer to numbers 
(or perhaps to more general objects that can be put in 
order). A mathematical “object" that behaves like this is 
called a relation, though it might be more accurate to call 
it a potential relationship. “Equals” and “is an element 
of” are two other examples of relations. 

As with functions, it is important, when specifying 
a relation, to be careful about which objects are to be 
related. Usually a relation comes with a set A of objects 
that may or may not be related to each other. For exam- 
ple, the relation “<” might be defined on the set of all 
positive integers, or alternatively on the set of all real 
numbers; strictly speaking these are different relations. 
Sometimes relations are defined with reference to two 
sets A and B. For example, if the relation is “g,” then A 
might be the set of all positive integers and B the set of 
all sets of positive integers. 

There are many situations in mathematics where one 
wishes to regard different objects as “essentially the 
same,” and to help us make this idea precise there is 
a very important class of relations known as equiva- 
lence relations. Here are two examples. First, in elemen- 
tary geometry one sometimes cares about shapes but 
not about sizes. Two shapes are said to be similar if 
one can be transformed into the other by a combina- 
tion of reflections, rotations, translations, and enlarge- 
ments (see figure 1); the relation “is similar to” is an 
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Figure 1 Simllar shapes. 


equivalence relation. Second, when doing arithmetic 
modulo m [IH.61], one does not wish to distinguish 
between two whole numbers that differ by a multiple 
of m: in this case one says that the numbers are congru- 
ent (mod m)\ the relation “is congruent (mod m) to” is 
another equivalence relation. 

What exactly is it that these two relations have in com- 
mon? The answer is that they both take a set (in the first 
case the set of all geometrical shapes, and in the sec- 
ond the set of all whole numbers) and split it into parts, 
called equivalence classes, where each part consists of 
objects that one wishes to regard as essentially the same. 
In the first example, a typical equivalence class is the 
set of all shapes that are similar to some given shape; 
in the second, it is the set of all integers that leave a 
given remainder when you divide by m (for example, if 
m = 7 then one of the equivalence classes is the set 
{...,-16,-9,-2,5,12,19,...}). 

An alternative definition of what it means for a rela- 
tion ~, dehned on a set A, to be an equivalence relation 
is that it has the following three properties. First, it is 
reflexive, which means that x ~ x for every x in A. Sec- 
ond, it is symmetric, which means that if x and y are 
elements of A and x ~ y, then it must also be the case 
that y ~ x. Third, it is transitive, meaning that if x, y, 
and z are elements of A such that x ~ y and y ~ z, 
then it must be the case that x ~ z. (To get a feel for 
these properties, it may help if you satisfy yourself that 
the relations “is similar to” and “is congruent (mod m) 
to” both have all three properties, while the relation “c,” 
defined on the positive integers, is transitive but neither 
reflexive nor symmetric.) 

One of the main uses of equivalence relations is to 
make precise the notion of quotient [1.3 §3.3] construc- 


2.4 Binary Operations 

Let us return to one of our earlier examples, the sentence 
“Two plus two equals four.” We have analyzed the word 
“equals” as a relation, an expression that sits between 
the noun phrases “two plus two” and “four” and makes 
a sentence out of them. But what about “plus”? That also 
sits between two nouns. However, the result, “two plus 
two,” is not a sentence but a noun phrase. That pattern is 
characteristic of binary operations. Some familiar exam- 
ples of binary operations are “plus,” “minus,” “times,” 
“divided by," and “raised to the power.” 

As with fimetions, it is customary, and convenient, to 
be careful about the set to which a binary operation is 
applied. From a more formal point of view, a binary oper- 
ation on a set A is a funetion that takes pairs of elements 
of A and produces further elements of A from them. To 
be more formal still, it is a funetion with the set of all 
pairs (x, y) of elements of A as its domain and with A 
as its range. This way of looking at it is not reflected in 
the notation, however, since the symbol for the opera- 
tion comes between x and y rather than before them: 
we write x + y rather than + (x, y ). 

There are four properties that a binary operation may 
have that are very useful if one wants to manipulate sen- 
tences in which it appears. Let us use the symbol * to 
denote an arbitrary binary operation on some set A. The 
operation * is said to be commutative if x * y is always 
equal to y * x, and associative if x * (y * z) is always 
equal to (x * y) * z. For example, the operations “plus” 
and “times” are commutative and associative, whereas 
“minus,” “divided by,” and “raised to the power” are nei- 
ther (for instance, 9 - (5 - 3) = 7 while (9 - 5) - 3 = 1). 
These last two operations raise another issue: unless the 
set A is chosen carefully, they may not always be defined. 
For example, if one restricts one’s attention to the posi- 
tive integers, then the expression 3-5 has no meaning. 
There are two conventions one could imagine adopting 
in response to this. One might decide not to insist that 
a binary operation should be defined for every pair of 
elements of A, and to regard it as a desirable extra prop- 
erty of an operation if it is defined everywhere. But the 
convention actually in force is that binary operations do 
have to be defined everywhere, so that “minus,” though 
a perfeetly good binary operation on the set of all inte- 
gers, is not a binary operation on the set of all positive 
integers. 

An element e of A is called an identity for * if e * x = 
x*e = x for every element x of A. The two most obvious 
examples are 0 and 1, which are identities for “plus" and 
“times,” respectively. Finally, if * has an identity e and 
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x belongs to A, then an inverse for x is an element y 
such that x*y = y*x = e. For example, if * is “plus” 
then the inverse of x is -x, while if * is “times” then 
the inverse is 1/x. 

These basic properties of binary operations are fun- 
damental to the structures of abstract algebra. See four 
IMPORTANT ALGEBRAIC STRUCTURES [1.3 §2] for further 
details. 

3 Some Elementary Logic 

3.1 Logical Connectives 

A logical connective is the mathematical equivalent of a 
conjunction. That is, it is a word (or symbol) that joins 
two sentences to produce a new one. We have already 
discussed an example, namely “and” in its sentence- 
linking meaning, which is some times written by the sym- 
bol “a,” particularly in more formal or abstract mathe- 
matical discourse. If P and Q are statements (note here 
the mathematical habit of representing not just num- 
bers but any objects whatsoever by single letters), then 
P a Q is the statement that is true if and only if both P 
and Q are true. 

Another connective is the word “or,” a word that has 
a more specific meaning for mathematicians than it 
has for normal speakers of the English language. The 
mathematical use is illustrated by the tiresome joke of 
responding, “Yes piease,” to a question such as, “Would 
you like your coffee with or without sugar?” The symbol 
for “or,” if one wishes to use a symbol, is “v,” and the 
statement PvQis true if and only if P is true or Q is 
true. This is taken to include the case when they are both 
true, so “or,” for mathematicians, is always the so-called 
inclusive version of the word. 

A third important connective is “implies,” which is 
usually written The statement P => Q means, 
roughly speaking, that Q is a consequence of P, and is 
sometimes read as “if P then Q.” However, as with “or,” 
this does not mean quite what it would in English. To 
get a feel for the difference, consider the following even 
more extreme example of mathematical pedantry. At the 
supper table, my young daughter once said, “Put your 
hånd up if you are a giri.” One of my sons, to tease her, 
put his hånd up on the grounds that, since she had not 
added, “and keep it down if you are a boy,” his doing so 
was compatible with her command. 

Something like this attitude is taken by mathemati- 
cians to the word “implies,” or to sentences containing 
the word “if.” The statement P => Q is considered to be 
true under all circumstances except one: it is not true if P 
is true and Q is false. This is the definition of “implies.” It 


can be confusing because in English the word “implies” 
suggests some sort of connection between P and Q, that 
P in some way causes Q or is at least relevant to it. If P 
causes Q then certainly P cannot be true without Q being 
true, but all a mathematician cares about is this logical 
consequence and not whether there is any reason for it. 
Thus, if you want to prove that P => O, all you have to do 
is rule out the possibility that P could be true and Q false 
at the same time. To give an example: if n is a positive 
integer, then the statement “n is a perfect square with 
final digit 7” implies the statement “n is a prime num- 
ber,” not because there is any connection between the 
two but because no perfect square ends in a 7. Of course, 
implications of this kind are less interesting mathemat- 
ically than more genuine-seeming ones, but the reward 
for accepting them is that, once again, one avoids being 
confused by some of the ambiguities and subtle nuances 
of ordinary language. 

3.2 Quantifiers 

Yet another ambiguity in the English language is ex- 
ploited by the following old joke that suggests that our 
priorities need to be radically rethought. 

(4) Nothing is better than lifelong happiness. 

(5) But a cheese sandwich is better than nothing. 

(6) Therefore, a cheese sandwich is better than life- 
long happiness. 

Let us try to be precise about how this play on words 
works (a good way to ruin any joke, but not a tragedy in 
this case). It hinges on the word “nothing,” which is used 
in two different ways. The hrst sentence means “There 
is no single thing that is better than lifelong happiness,” 
whereas the second means “It is better to have a cheese 
sandwich than to have nothing at all.” In other words, 
in the second sentence, “nothing” stands for what one 
might call the nuli option, the option of having nothing, 
whereas in the first it does not (to have nothing is not 
better than to have lifelong happiness). 

Words like “all,” “some,” “any,” “every,” and “nothing” 
are called quantifiers, and in the English language they 
are highly prone to this kind of ambiguity. Mathemati- 
cians therefore make do with just two quantifiers, and 
the rules for their use are much stricter. They tend to 
come at the beginning of sentences, and can be read 
as “for all” (or “for every”) and “there exists” (or “for 
some”). A rewriting of sentence (4) that renders it unam- 
biguous (and much less like a real English sentence) 
is 

(4') For all x, lifelong happiness is better than x. 
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The second sentence cannot be rewritten in these 
terms because the word “nothing” is not playing the role 
of a quantifler. (Its nearest mathematical equivalent is 
something like the empty set, that is, the set with no 
elements.) 

Armed with “for all” and “there exists,” we can be 
clear about the difference between the beginnings of the 
following sentences. 

(7) Everybody likes at least one drink, namely water. 

(8) Everybody likes at least one drink; I myself go for 
red wine. 

The first sentence makes the point (not necessarily cor- 
rectly) that there is one drink that everybody likes, 
whereas the second claims merely that we all have some- 
thing we like to drink, even if that something varies from 
person to person. The precise formulations that capture 
the difference are as follows. 

(7') There exists a drink D such that, for every person 
P, P likes D. 

(8') For every person P there exists a drink D such that 
P likes D. 

This illustrates an important general principle: if you 
take a sentence that begins “for every x there exists y 
such that ...” and interchange the two parts so that it 
now begins “there exists y such that, for every x, . . . ,” 
then you obtain a much stronger statement, since y is 
no longer allowed to depend on x. If the second state- 
ment is still true— that is, if you really can choose a y 
that works for all the x at once— then the first statement 
is said to hold uniformly. 

The symbols V and 3 are often used to stand for 
“for all” and “there exists,” respectively. This allows us 
to write quite complicated mathematical sentences in a 
highly symbolic form if we want to. For example, sup- 
pose we let P be the set of all primes, as we did earlier. 
Then the following symbols make the claim that there 
are infinitely many primes, or rather a slightly different 
claim that is equivalent to it. 

(9) Vn3m (m > n) a (meP). 

In words, this says that for every n we can find some 
m that is both bigger than n and a prime. If we wish to 
unpack sentence (6) further, we could replace the part 
me P by 

(10) Ma,b ab = m=> ((« =1) v (b = 1)). 

There is one final important remark to make about the 
quantifiers “V” and “3.” I have presented them as if they 


were freestanding, but actually a quantifler is always 
associated with a set (one says that it quantifies over that 
set). For example, sentence (10) would not be a transla- 
tion of the sentence “m is prime” if a and b were allowed 
to be fractions: if a = 3 and b = j then ab = 7 with- 
out either a or b equaling 1, but this does not show that 
7 is not a prime. Implicit in the opening symbols M a,b 
is the idea that a and b are intended to be positive inte- 
gers. If this had not been clear from the context, then we 
could have used the symbol N (which stands for the set 
of all positive integers) and started sentence (10) with 
Ma, b g N instead. 

3.3 Negation 

The basic idea of negation in mathematics is very sim- 
ple: there is a symbol, which means “not,” and if P 
is any mathematical statement, then -iP stands for the 
statement that is true if and only if P is not true. How- 
ever, this is another example of a word that has a slightly 
more restricted meaning to mathematicians than it has 
in ordinary speech. 

To illustrate this phenomenon once again, let us take 
A to be a set of positive integers and ask ourselves what 
the negation is of the sentence “Every number in the set 
A is odd.” Many people when asked this question will 
suggest, “Every number in the set A is even.” However, 
this is wrong: if one thinks carefully about what exactly 
would have to happen for the first sentence to be false, 
one realizes that all that is needed is that at least one 
number in A should be even. So in faet the negation of 
the sentence is, “There exists a number in A that is even.” 

What explains the temptation to give the first, incor- 
rect answer? One possibility emerges when one writes 
the sentence more formally, thus: 

(11) Vnei n is odd. 

The first answer is obtained if one negates just the last 
part of this sentence, “n is odd”; but what is asked for 
is the negation of the whole sentence. That is, what is 
wanted is not 

(12) Vnei -i(nisodd), 
but rather 

(13) i(Vn e A n is odd), 
which is equivalent to 

(14) 3 n&A n is even. 
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A second possible explanation is that one is inclined (for 
psycholinguistic reasons) to think of the phrase “every 
element of A” as denoting something like a single, typ- 
ical element of A. If that comes to have the feel of a 
particular number n, then we may feel that the negation 
of “n is odd” is “n is even.” The remedy is not to think 
of the phrase “every element of A” on its own: it should 
always be part of the longer phrase, “ for every element 


3.4 Free and Bound Variables 

Suppose we say something like, “At time t the speed of 
the projectile is v.” The letters t and v stand for real 
numbers, and they are called variables, because in the 
back of our mind is the idea that they are changing. 
More generally, a variable is any letter used to stand for 
a mathematical object, whether or not one thinks of that 
object as changing through time. Let us look once again 
at the formal sentence that said that a positive integer 
m is prime: 

(10) Va,b ab = m => ((a = 1) v (b = 1)). 

In this sentence, there are three variables, a, b, and m, 
but there is a very important grammatical and semantic 
difference between the first two and the third. Here are 
two results of that difference. First, the sentence does 
not really make sense unless we already know what m is 
from the context, whereas it is important that a and b do 
not have any prior meaning. Second, while it makes per- 
fect sense to ask, “For which values of m is sentence (10) 
true?” it makes no sense at all to ask, “For which values 
of a is sentence (10) true?” The letter m in sentence (10) 
stands for a fixed number, not specified in this sentence, 
while the letters a and b, because of the initial Va, b, do 
not stand for numbers— rather, in some way they search 
through all pairs of positive integers, trying to find a pair 
that multiply together to give m. Another sign of the 
difference is that you can ask, “What number is m?” but 
not, “What number is a?” A fourth sign is that the mean- 
ing of sentence (10) is completely unaffected if one uses 
different letters for a and b, as in the reformulation 

(10') Vc,d cd = m -> ((c = 1) v (d = 1)). 

One cannot, however, change m to n without establish- 
ing first that n denotes the same integer as m. A vari- 
able such as m, which denotes a specific object, is called 
a free variable. It sort of hovers there, free to take any 
value. A variable hke a and b, of the kind that does 
not denote a specific object, is called a bound variable, 
or sometimes a dummy variable. (The word “bound” 


is used mainly when the variable appears just after a 
quantifier, as in sentence (10).) 

Yet another indication that a variable is a dummy 
variable is when the sentence in which it occurs can 
be rewritten without it. For example, the notation 
Xh°=if(n) is shorthand for f(l) + f(2) + ■ ■ ■ + /( 100), 
and the second way of writing it does not involve the 
letter n, so n was not really standing for anything in 
the first way. Sometimes, actual elimination is not pos- 
sible, but one feels it could be done in principle. For 
distance, the sentence “For every real number x, x is 
either positive, negative, or zero” is a bit like putting 
together infinitely many sentences such as “t is either 
positive, negative, or zero,” one for each real number t, 
none of which involve a variable. 

4 Levels of Formality 

It is a surprising faet that a small number of set-theo- 
retic concepts and logical terms can be used to provide 
a precise language that is versatile enough to express 
all the statements of ordinary mathematics. There are 
some technicalities to sort out, but even these can often 
be avoided if one allows not just sets but also numbers 
as basic objects. However, if you look at a well-written 
mathematics paper, then mueh of it will be written not 
in symbolic language peppered with symbols such as 
V and 3, but in what appears to be ordinary English. 
(Some papers are written in other languages, particularly 
French, but English has established itself as the interna- 
tional language of mathematics.) How can mathemati- 
cians be confident that this ordinary English does not 
lead to confusion, ambiguity, and even incorrectness? 

The answer is that the language typically used is a 
careful compromise between fully colloquial English, 
which would indeed run the risk of being unacceptably 
imprecise, and fully formal symbolism, which would be 
a nightmare to read. The ideal is to write in as friendly 
and approachable a way as possible, while making sure 
that the reader (who, one assumes, has plenty of experi- 
ence and training in how to read mathematics) can see 
easily how what one writes could be made more for- 
mal if it became important to do so. And sometimes it 
does become important: when an argument is difficult 
to grasp it may be that the only way to convince oneself 
that it is correct is to rewrite it more formally. 

Consider, for example, the following reformulation of 
the principle of mathematical induction, which underlies 
many proofs: 

(15) Every nonempty set of positive integers has a least 
element. 
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If we wish to translate this into a more formal lan- 
guage we need to strip it of words and phrases such 
as “nonempty” and “has.” But this is easily done. To say 
that a set A of positive integers is nonempty is simply 
to say that there is a positive integer that belongs to A. 
This can be stated symbolically: 

(16) 3n G N n G A. 

What does it mean to say that A has a least element? 
It means that there exists an element x of A such that 
every element y of A is either greater than x or equal to 
x itself. This formulation is again ready to be translated 
into symbols: 

(17) 3x G A VyGA (y > x) v (y = x). 

Statement (15) says that (16) implies (17) for every set A 
of positive integers. Thus, it can be written symbolically 
as follows: 

(18) VA C N 

[(3neN n G A) 

=>0 xGAMyGA (y > x) v (y = x))]. 

Here we have two very different modes of presentation 
of the same mathematical faet. Obviously (15) is mueh 
easier to understand than (18). But if, for example, one 
is concerned with the foundations of mathematics, or 
wishes to write a computer program that checks the 
correctness of proofs, then it is better to work with a 
greatly pared-down grammar and vocabulary, and then 
(18) has the advantage. In practice, there are many dif- 
ferent levels of formality, and mathematicians are adept 
at switching between them. It is this that makes it pos- 
sible to feel completely confident in the correctness of 
a mathematical argument even when it is not presented 
in the manner of (18)— though it is also this that allows 
mistakes to slip through the net from time to time. 


1.3 Some Fundamental Mathematical 
Definitions 


The concepts discussed in this article occur throughout 
so mueh of modern mathematics that it would be inap- 
propriate to discuss them in part III— they are too basic. 
Many later articles will assume at least some acquain- 
tance with these concepts, so if you have not met them, 
then reading this article will help you to understand 
significantly more of the book. 


1 The Main Number Systems 

Almost always, the first mathematical concept that a 
child is exposed to is the idea of numbers, and num- 
bers retain a central place in mathematics at all levels. 
However, it is not as easy as one might think to say 
what the word “number” means: the more mathemat- 
ics one learns, the more uses of this word one comes 
to know, and the more sophisticated one’s concept of 
number becomes. This individual development parallels 
a historical development that took many centuries (see 

FROM NUMBERS TO NUMBER SYSTEMS [II.l]). 

The modern view of numbers is that they are hest 
regarded not individually but as parts of larger wholes, 
called number systems-, the distinguishing features of 
number systems are the arithmetical operations— such 
as addition, multiplication, subtraction, division, and 
extraction of roots— that can be performed on them. 
This view of numbers is very fruitful and provides a 
springboard into abstract algebra. The rest of this sec- 
tion gives a brief description of the five main number 
systems. 

1.1 The Natural Numbers 

The natural numbers, otherwise known as the positive 
integers, are the numbers familiar even to young chil- 
dren: 1, 2, 3, 4, and so on. It is the natural numbers that 
we use for the very basic mathematical purpose of count- 
ing. The set of all natural numbers is usually denoted 
N. 

Of course, the phrase “1, 2, 3, 4, and so on” does not 
constitute a formal definition, but it does suggest the 
following basic picture of the natural numbers, one that 
we tend to take for granted. 

(i) Given any natural number n there is another, n + 1, 
that comes next— known as the successor of n. 

(ii) A list that starts with 1 and follows each number 
by its successor will include every natural number 
exaetly once and nothing else. 

This picture is encapsulated by the peano axioms 
[m.69]. 

Given two natural numbers m and n one can add them 
together or multiply them, obtaining in each case a new 
natural number. By contrast, subtraction and division 
are not always possible. If we want to give meaning to 
expressions such as 8 - 13 or |, then we must work in 
a larger number system. 
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1.2 The Integers 

The natural numbers are not the only whole numbers, 
since they do not include zero or negative numbers, both 
of which are indispensable to mathematics. One of the 
first reasons for introducing zero was that it is needed 
for the normal decimal notation of positive integers— 
how else could one conveniently write 1005? However, 
it is now thought of as much more than just a conve- 
nience, and the property that makes it significant is that 
it is an additive identity, which means that adding zero to 
any number leaves that number unchanged. And while 
it is not particularly interesting to do to a number some- 
thing that has no effect, the property itself is interest- 
ing and distinguishes zero from all other numbers. An 
immediate illustration of this is that it allows us to think 
about negative numbers: if n is a positive integer, then 
the defining property of -n is that when you add it to n 
you get zero. 

Somebody with little mathematical experience may 
unthinkingly assume that numbers are for counting and 
find negative numbers objectionable because the answer 
to a question beginning “How many” is never negative. 
However, simple counting is not the only use for num- 
bers, and there are many situations that are naturally 
modeled by a number system that includes both posi- 
tive and negative numbers. For example, negative num- 
bers are sometimes used for the amount of money in 
a bank account, for temperature (in degrees Celsius or 
Fahrenheit), and for altitude compared with sea level. 

The set of all integers— positive, negative, and zero— 
is usually denoted Z (for the German word “Zahlen,” 
meaning “numbers”). Within this system, subtraction is 
always possible: that is, if m and n are integers, then so 
is m-n. 

1.3 The Rational Numbers 

So far we have considered only whole numbers. If we 
form all possible fractions as well, then we obtain the 
rational numbers. The set of all rational numbers is 
denoted Q (for “quotients”). 

One of the main uses of numbers besides counting is 
measurement, and most quantities that we measure are 
ones that can vary continuously, such as length, weight, 
temperature, and velocity. For these, whole numbers are 
inadequate. 

A more theoretical justifkation for the rational num- 
bers is that they form a number system in which division 
is always possible— except by zero. This faet, together 
with some basic properties of the arithmetical opera- 
tions, means that Q is a field. What helds are and why 


they are important will be explained in more detail later 
(section 2.2). 

1.4 The Real Numbers 

A famous discovery of the ancient Greeks, often 
attributed, despite very inadequate evidence, to the 
school of pythagoras [VI.l], was that the square root 
of 2 is not a rational number. That is, there is no frac- 
tion p/q such that (p/q) 2 = 2. The Pythagorean the- 
orem about right-angled triangles (which was probably 
known at least a thousand years before Pythagoras) tells 
us that if a square has sides of length 1, then the length 
of its diagonal is V2. Consequently, there are lengths 
that cannot be measured by rational numbers. 

This argument seems to give strong practical reasons 
for extending our number system still further. However, 
such a conclusion can be resisted: after all, we cannot 
make any measurements with infinite precision, so in 
practice we round off to a certain number of decimal 
places, and as soon as we have done so we have pre- 
sented our measurement as a rational number. (This 
point is discussed more fully in Numerical analysis 
[IV. 20] J 

Nevertheless, the theoretical arguments for going 
beyond the rational numbers are irresistible. If we 
want to solve polynomial equations, take logarithms 
[m.25 §4], do trigonometry, or work with the GAUSS- 
ian distribution [III.73 §5], to give just four examples 
from an almost endless list, then irrational numbers will 
appear everywhere we look. They are not used direetly 
for the purposes of measurement, but they are needed 
if we want to reason theoretically about the physical 
world by describing it mathematically. This necessarily 
involves a certain amount of idealization: it is far more 
convenient to say that the length of the diagonal of a 
unit square is V2 than it is to talk about what would be 
observed, and with what degree of certainty, if one tried 
to measure this length as accurately as possible. 

The real numbers can be thought of as the set of 
all numbers with a finite or infinite decimal expansion. 
In the latter case, they are defined not direetly but by 
a process of successive approximation. For example, 
the squares of the numbers 1, 1.4, 1.41, 1.414, 1.4142, 

1.41421 get as close as you like to 2, if you go far 

enough along the sequence, which is what we mean by 
saying that the square root of 2 is the infinite decimal 
1.41421.... 

The set of all real numbers is denoted R. A more 
abstract view of R is that it is an extension of the rational 
number system to a larger field, and in faet the only one 
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possible in which processes of the above kind always 
give rise to numbers that themselves belong to IR. 

Because real numbers are intimately connected with 
the idea of limits (of successive approximations), a true 
appreciation of the real number system depends on an 
understanding of mathematical analysis, which will be 
discussed in section 5. 

1.5 The Complex Numbers 

Many polynomial equations, such as the equation x 2 = 
2, do not have rational solutions but can be solved in E. 
However, there are many other equations that cannot be 
solved even in IR. The simplest example is the equation 
x 2 = -1, which has no real solution since the square 
of any real number is positive or zero. In order to get 
around this problem, mathematicians introduce a sym- 
bol, i, which they treat as a number, and they simply stip- 
ulate that i 2 is to be regarded as equal to - 1. The complex 
number system, denoted C, is the set of all numbers of 
the form a + bi, where a and b are real numbers. To 
add or multiply complex numbers, one treats i as a vari- 
able (like x, say), but any occurrences of i 2 are replaced 
by -1. Thus, 

(a + bi) + (c + di) = (a + c) + (b + d) i 

and 

(a + bi) (c + di) = ac + bci + adi + bdi 2 
= (ac - bd) + (bc + ad)i. 

There are several remarkable points to note about this 
definition. First, despite its apparently artificial nature, 
it does not lead to any inconsistency. Secondly, although 
complex numbers do not directly count or measure any- 
thing, they are immensely useful. Thirdly, and perhaps 
most surprisingly, even though the number i was intro- 
duced to help us solve just one equation, it in faet allows 
us to solve all polynomial equations. This is the famous 
FUNDAMENTAL THEOREM OF ALGEBRA [V.15]. 

One explanation for the Utility of complex numbers 
is that they provide a concise way to talk about many 
aspects of geometry, via Argand diagrams. These repre- 
sent complex numbers as points in the plane, the num- 
ber a + bi corresponding to the point with coordin- 
ates ( a,b ). If r = -Ja 2 4 b 2 and 0 = tan _1 (b/a), then 
a = r cos 0 and b = r sin 0. It turns out that multiplying 
a complex number z = x + yiby a + bi corresponds to 
the following geometrical process. First, you associate 
z with the point ( x,y ) in the plane. Next, you multiply 
this point by r, obtaining the point ( rx,ry ). Finally, 
you rotate this new point counterclockwise about the 
origin through an angle of 0. In other words, the effeet 


on the complex plane of multiplication by a + bi is to 
dilate it by r and then rotate it by 0. In particular, if 
a 2 +b 2 = 1, then multiplying by a + bi corresponds to 
rotating by 0. 

For this reason, polar coordinates are at least as good 
as Cartesian coordinates for representing complex num- 
bers: an alternative way to write a+ bi is re'°, which tells 
us that the number has distance r from the origin and is 
positioned at an angle 0 around from the positive part of 
the real axis (in a counterclockwise direction). If z = re i0 
with r > 0, then r is called the modulus of z, denoted 
by |z|, and 0 is the argument of z. (Since adding 2n 
to 0 does not change e i0 , it is usually understood that 
0 ^ 0 < 2rr, or sometimes that -tt ^ 0 < tt.) One final 
useful definition: if z = x+yi is a complex number, then 
its complex conjugate, written z, is the number x - yi. 
It is easy to check that zz = x 2 + y 2 = |z| 2 . 

2 Four Important Algebraic Structures 

In the previous section it was emphasized that numbers 
are best thought of not as individual objects but as mem- 
bers of number systems. A number system consists of 
some objects (numbers) together with operations (such 
as addition and multiplication) that can be performed 
on those objects. As such, it is an example of an alge- 
braic structure. However, there are many very important 
algebraic structures that are not number systems, and a 
few of them will be introduced here. 

2.1 Groups 

If S is a geometrical shape, then a rigid motion of S 
is a way of moving S in such a way that the distances 
between the points of S are not changed— squeezing and 
stretching are not allowed. A rigid motion is a symme- 
try of S if, after it is completed, S looks the same as it 
did before it moved. For example, if S is an equilateral 
triangle, then rotating S through 120° about its center 
is a symmetry; so is reflecting S about a line that passes 
through one of the vertices of S and the midpoint of the 
opposite side. 

More formally, a symmetry of S is a funetion / from S 
to itself such that the distance between any two points 
x and y of 5 is the same as the distance between the 
transformed points f(x) and fly). 

This idea can be hugely generalized: if S is any mathe- 
matical structure, then a symmetry of S is a funetion 
from S to itself that preserves its structure. If S is a 
geometrical shape, then the mathematical structure that 
should be preserved is the distance between any two of 
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its points. But there are many other mathematical struc- 
tures that a function may be asked to preserve, most 
notably algebraic structures of the kind that will soon be 
discussed. It is fruitful to draw an analogy with the geo- 
metrical situation and regard any structure-preserving 
function as a sort of symmetry. 

Because of its extreme generality, symmetry is an all- 
pervasive concept within mathematics; and wherever 
symmetries appear, structures known as groups fol- 
low close behind. To explain what these are and why 
they appear, let us return to the example of an equi- 
lateral triangle, which has, as it turns out, six possible 
symmetries. 

Why is this? Well, let / be a symmetry of an equilateral 
triangle with vertices A, B, and C and suppose for con- 
venience that this triangle has sides of length 1. Then 
/( A), /( B), and /(C) must be three points of the tri- 
angle and the distances between these points must all 
be 1. It follows that /( A), /( B), and /(C) are distinct 
vertices of the triangle, since the furthest apart any two 
points can be is 1 and this happens only when the two 
points are distinct vertices. So /( A), /( B), and /(C) are 
the vertices A, B, and C in some order. But the number of 
possible orders of A, B, and C is 6. It is not hard to show 
that, once we have chosen/(A), /(B), and /(C), the rest 
of what / does is completely determined. (For example, 
if X is the midpoint of A and C, then /(X) must be the 
midpoint of /( A) and /(C) since there is no other point 
at distance \ from/(A) and /(C).) 

Let us refer to these symmetries by writing down in 
order what happens to the vertices A, B, and C. So, for 
instance, the symmetry ACB is the one that leaves the 
vertex A fixed and exchanges B and C, which is achieved 
by reflecting the triangle in the line that joins A to the 
midpoint of B and C. There are three reflections like this: 
ACB, CBA, and BAC. There are also two rotations: BCA 
and CAB. Finally, there is the “trivial” symmetry, ABC, 
which leaves all points where they were originally. (The 
“trivial” symmetry is useful in much the same way as 
zero is useful for the algebra of integer addition.) 

What makes these and other sets of symmetries into 
groups is that any two symmetries can be composed, 
meaning that one symmetry followed by another pro- 
duces a third (since if two operations both preserve a 
structure then their combination clearly does too). For 
example, if we follow the reflection BAC by the reflection 
ACB, then we obtain the rotation CAB. To work this out, 
one can either draw a picture or use the following kind 
of reasoning: the first symmetry takes A to B and the sec- 
ond takes B to C, so the combination takes A to C, and 
similarly B goes to A, and C to B. Notice that the order 


in which we perform the symmetries matters: if we had 
started with the reflection ACB and then done the reflec- 
tion BAC, then we would have obtained the rotation BCA. 
(If you try to see this by drawing a picture, it is impor- 
tant to think of A, B, and C as labels that stay where they 
are rather than moving with the triangle— they mark 
positions that the vertices can occupy.) 

We can think of symmetries as “objects” in their own 
right, and of composition as an algebraic operation, a bit 
like addition or multiplication for numbers. The opera- 
tion has the following useful properties: it is associa- 
tive, the trivial symmetry is an identity element, and 
every symmetry has an inverse. (See binary operations 
[1.2 §2.4]. For example, the inverse of a reflection is itself, 
since doing the same reflection twice leaves the triangle 
where it started.) More generally, any set with a binary 
operation that has these properties is called a group. It 
is not part of the definition of a group that the binary 
operation should be commutative, since, as we have just 
seen, if one is composing two symmetries then it often 
makes a difference which one goes first. However, if it is 
commutative then the group is called Abelian, after the 
Norwegian mathematician Niels Henrik abel [VI.32]. The 
number systems Z, Q, R, and C all form Abelian groups 
with the operation of addition, or under addition, as one 
usually says. If you remove zero from Q, R, and C, then 
they form Abehan groups under multiplication, but Z 
does not because of a lack of inverses: the reciprocal of 
an integer is not usually an integer. Further examples of 
groups will be given later in this section. 

2.2 Fields 

Although several number systems form groups, to 
regard them merely as groups is to ignore a great deal of 
their algebraic structure. In particular, whereas a group 
has just one binary operation, the standard number 
systems have two, namely addition and multiplication 
(from which further ones, such as subtraction and divi- 
sion, can be derived). The formal definition of a field is 
quite long: it is a set with two binary operations and 
there are several axioms that these operations must 
satisfy. Fortunately, there is an easy way to remember 
these axioms. You just write down all the basic proper- 
ties you can think of that are satisfied by addition and 
multiplication in the number systems Q, M, and C. 

These properties are as follows. Both addition and 
multiplication are commutative and associative, and 
both have identity elements (0 for addition and 1 for 
multiplication). Every element x has an additive inverse 
-x and a multiplicative inverse 1/x (except that 0 does 
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not have a multiplicative inverse). It is the existence of 
these inverses that allows us to define subtraction and 
division: x-y means x+(-y) and x ly means x ■ ( 1 ly). 

That covers all the properties that addition and mul- 
tiplication satisfy individually. However, a very general 
rule when defining mathematical structures is that if a 
definition splits into parts, then the definition as a whole 
will not be interesting unless those parts interact. Here 
our two parts are addition and multiphcation, and the 
properties mentioned so far do not relate them in any 
way. But one final property, known as the distributive 
law, does this, and thereby gives helds their special char- 
acter. This is the rule that tells us how to multiply out 
brackets : x(y + z) = xy + xz for any three numbers x, 
y, and z. 

Having listed these properties, one may then view the 
whole situation abstractly by regarding the properties as 
axioms and saying that a held is any set with two binary 
operations that satisfy all those axioms. However, when 
one works in a held, one usually thinks of the axioms not 
as a list of statements but rather as a general license to 
do all the algebraic manipulations that one can do when 
talking about rational, real, and complex numbers. 

Clearly, the more axioms one has, the harder it is to 
hnd a mathematical structure that satishes them, and 
it is indeed the case that helds are harder to come by 
than groups. For this reason, the hest way to understand 
helds is probably to concentrate on examples. In addi- 
tion to Q, M, and €, one other held stands out as funda- 
mental, namely ¥ p , which is the set of integers modulo 
a prime p, with addition and multiplication also dehned 
modulo p (see modular arithmetic [in. 60 ]). 

What makes helds interesting, however, is not so 
much the existence of these basic examples as the faet 
that there is an important process of extension that 
allows one to build new helds out of old ones. The idea 
is to start with a held F, hnd a polynomial P that has 
no roots in F, and “adjoin” a new element to F with 
the stipulation that it is a root of P. This produces an 
extended held F', which consists of everything that one 
can produce from this root and from elements of F using 
addition and multiplication. 

We have already seen an important example of this 
process: in the held M, the polynomial P(x) = x 2 + 1 has 
no root, so we adjoined the element i and let C be the 
held of all combinations of the form a + bi. 

We can apply exaetly the same process to the held F3, 
in which again the equation x 2 + 1 =0 has no solu- 
tion. If we do so, then we obtain a new held, which, like 
C, consists of all combinations of the form a + bi, but 
now a and b belong to F3 . Since F3 has three elements, 


this new held has nine elements. Another example is the 
held Q(V’2), which consists of all numbers of the form 
a + 2, where now a and b are rational numbers. A 

shghtly more comphcated example is Q(y), where y is 
a root of the polynomial x 3 - x - 1. A typical element 
of this held has the form a + by + cy 2 , with a, b, and c 
rational. If one is doing arithmetic in Q(y), then when- 
ever y 3 appears, it can be replaced by y + 1 (because 
y 3 - y - 1 = 0), just as i 2 can be replaced by - 1 in 
the complex numbers. For more on why held extensions 
are interesting, see the discussion of automorphisms 
in section4.1. 

A second very signiheant justiheation for introducing 
helds is that they can be used to form vector spaces, and 
it is to these that we now turn. 

2.3 Vector Spaces 

One of the most convenient ways to represent points in 
a plane that stretches out to inhnity in all directions is 
to use Cartesian coordinates. One chooses an origin and 
two directions X and Y, usually at right angles to each 
other. Then the pair of numbers (a, b) stands for the 
point you reach in the plane if you go a distance a in 
direction X and a distance b in direction Y (where if a 
is a negative number such as -2, this is interpreted as 
going a distance +2 in the opposite direction to X, and 
similarly for b). 

Another way of saying the same thing is this. Let x 
and y stand for the unit vectors in directions X and 
Y, respectively, so their Cartesian coordinates are (1, 0) 
and (0, 1). Then every point in the plane is a so-called 
linear combination ax + by of the basis vectors x and 
y. To interpret the expression ax + by, hrst rewrite it 
as a.(l, 0) + b( 0, 1). Then a times the unit vector (1,0) 
is (a, 0) and b times the unit vector (0, 1) is (0, b ) and 
when you add (a, 0) and (0 ,b) coordinate by coordinate 
you get the vector (a, b). 

Here is another situation where linear combinations 
appear. Suppose you are presented with the differential 
equation (d 2 y/dx 2 ) + y = 0, and happen to know (or 
notice) that y = sinx and y = cosx are two possible 
solutions. Then you can easily check that y = a sinx + 
b cosx is a solution for any pair of numbers a and b. 
That is, any linear combination of the existing solutions 
sinx and cosx is another solution. It turns out that all 
solutions are of this form, so we can regard sinx and 
cosx as “basis vectors” for the “space” of solutions of 
the differential equation. 

Linear combinations occur in many many contexts 
throughout mathematics. To give one more example, 
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an arbitrary polynomial of degree 3 has the form 
ax 3 + bx 2 + cx + d, which is a linear combination of the 
four basic polynomials l,x,x 2 , and x 3 . 

A vector space is a mathematical structure in which the 
notion of linear combination makes sense. The objects 
that belong to the vector space are usually called vec- 
tors, unless we are talking about a specific example and 
are thinking of them as concrete objects such as poly- 
nomials or solutions of a differential equation. Slightly 
more formally, a vector space is a set V such that, given 
any two vectors v and w (that is, elements of V) and 
any two real numbers a and b, we can form the linear 
combination av + bw. 

Notice that this linear combination involves objects of 
two different kinds, the vectors v and w and the num- 
bers a and b. The latter are known as scalars. The oper- 
ation of forming linear combinations can be broken up 
into two constituent parts: addition and scalar multipli- 
cation. To form the combination av + bw, first multiply 
the vectors v and w by the scalars a and b, obtaining the 
vectors av and bw, and then add these resulting vectors 
to obtain the full combination av + bw. 

The definition of linear combination must obey certain 
natural rules. Addition of vectors must be commutative 
and associative, with an identity, the zero vector, and 
inverses for each v (written -v). Scalar multiplication 
must obey a sort of associative law, namely that a(bv) 
and (ab) v are always equal. We also need two distribu- 
tive laws: (a + b)v = av + bv and a(v + w) = av + aw 
for any scalars a and b and any vectors v and w. 

Another context in which linear combinations arise, 
one that Ues at the heart of the usefulness of vector 
spaces, is the solution of simultaneous equations. Sup- 
pose one is presented with the two equations 3x + 2y = 
6 and x - y = 7. The usual way to solve such a pair of 
equations is to try to eliminate either x or y by adding 
an appropriate multiple of one of the equations to the 
other: that is, by taking a certain linear combination 
of the equations. In this case, we can eliminate y by 
adding twice the second equation to the first, obtain- 
ing the equation 5x = 20, which tells us that x = 4 and 
hence that y = - 3. Why were we allowed to combine 
equations like this? Well, let us write li and Ri for the 
left- and right-hand sides of the first equation, and sim- 
ilarly I2 and Ri for the second. If, for some particular 
choice of x and y, it is true that li = R\ and I2 = Ri, 
then clearly Li + 2 Li = Ri + 2Ri, as the two sides of this 
equation are merely giving different names to the same 
numbers. 

Given a vector space V, a basis is a collection of vectors 
vi,vi,...,v n with the following property: every vector 


in V can be written in exactly one way as a linear combi- 
nation a 1V1 + aiV'i + ■ • • + a n v n - There are two ways in 
which this can fail: there may be a vector that cannot be 
written as a linear combination of v\,vi,..., v n or there 
may be a vector that can be so expressed, but in more 
than one way. If every vector is a linear combination then 
we say that the vectors vi,vi,...,v n spån V , and if no 
vector is a linear combination in more than one way then 
we say that they are independent. An equivalent defini- 
tion is that vi , vi v n are independent if the only way 

of writing the zero vector as aiVi + aivi + • • ■ + a n v n 
is by taking a \ = ai = ■ ■ ■ = a n = 0. 

The number of elements in a basis is called the dimen- 
sion of V. It is not immediately obvious that there could 
not be two bases of different sizes, but it turns out that 
there cannot, so the concept of dimension makes sense. 
For the plane, the vectors x and y defined earlier formed 
a basis, so the plane, as one would hope, has dimen- 
sion 2. If we were to take more than two vectors, then 
they would no longer be independent: for example, if 
we take the vectors (1,2), (i, 3), and (3, 1), then we can 
write (0, 0) as the linear combination 8(1, 2) — 5(1,3) — 
(3,1). (To work this out one must solve some simulta- 
neous equations— this is typical of calculations in vector 
spaces.) 

The most obvious n-dimensional vector space is the 
space of all sequences (xi,...,x n ) of n real numbers. 
To add this to a sequence (yi,...,y n ) one simply forms 
the sequence (xi + yi, . . . , x n + y n ) and to multiply it 
by a scalar c one forms the sequence (cxi,...,cx n ). 
This vector space is denoted R n . Thus, the plane with 
its usual coordinate system is R 2 and three-dimensional 
space is R 3 . 

It is not in faet necessary for the number of vectors 
in a basis to be finite. A vector space that does not have 
a finite basis is called infinite dimensional. This is not 
an exotic property: many of the most important vec- 
tor spaces, particularly spaces where the “vectors" are 
funetions, are infinite dimensional. 

There is one final remark to make about scalars. They 
were defined earlier as real numbers that one uses to 
make linear combinations of vectors. But it turns out 
that the calculations one does with scalars, in particu- 
lar solving simultaneous equations, can all be done in a 
more general context. What matters is that they should 
belong to a held, so Q, R, and C can all be used as sys- 
tems of scalars, as indeed can more general helds. If the 
scalars for a vector space V come from a held F, then one 
says that V is a vector space over F. This generalization 
is important and useful: see, for example, algebraic 
NUMBERS [IV. 3 §17]. 
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2.4 Rings 

Another algebraic structure that is very important is a 
ring. Rings are not quite as central to mathematics as 
groups, fields, or vector spaces, so a proper discussion 
of them will be deferred to rings, ideals, and mod- 
ules [m.82], However, roughly speaking, a ring is an 
algebraic structure that has most, but not necessarily 
all, of the properties of a field. In particular, the require- 
ments of the multiplicative operation are less strict. The 
most important relaxation is that nonzero elements of 
a ring are not required to have multiplicative inverses; 
but sometimes multiplication is not even required to 
be commutative. If it is, then the ring itself is said to 
be commutative— a typical example of a commutative 
ring is the set Z of all integers. Another is the set of all 
polynomials with coefficients in some held F. 

3 Creating New Structures Out of Old Ones 

An important first step in understanding the defini- 
tion of some mathematical structure is to have a sup- 
ply of examples. Without examples, a definition is dry 
and abstract. With them, one begins to have a feeling 
for the structure that its definition alone cannot usually 
pro vide. 

One reason for this is that it makes it much easier 
to answer basic questions. If you have a general state- 
ment about structures of a given type and want to know 
whether it is true, then it is very helpful if you can test 
it in a wide range of particular cases. If it passes all 
the tests, then you have some evidence in favor of the 
statement. If you are lucky, you may even be able to 
see why it is true; alternatively, you may find that the 
statement is true for each example you try, but always 
for reasons that depend on particular features of the 
example you are examining. Then you will know that you 
should try to avoid these features if you want to find a 
counterexample. If you do find a counterexample, then 
the general statement is false, but it may still happen 
that a modification to the statement is true and useful. 
In that case, the counterexample will help you to find an 
appropriate modification. 

The moral, then, is that examples are important. So 
how does one find them? There are two completely dif- 
ferent approaches. One is to build them from scratch. 
For example, one might define a group G to be the group 
of all symmetries of an icosahedron. Another, which is 
the main topic of this section, is to take some already 
constructed examples and build new ones out of them. 
For example, the group 7?, which consists of all pairs 
of integers (x,y), with addition defined by the obvious 


rule ( x,y ) + (x',y') = (x + x',y + y'), is a “product” 
of two copies of the group Z. As we shall see, this notion 
of product is very general and can be applied in many 
other contexts. But first let us look at an even more basic 
method of finding new examples. 

3.1 Substructures 

As we saw earlier, the set C of all complex numbers, with 
the operations of addition and multiplication, forms one 
of the most basic examples of a field. It also contains 
many subfields : that is, subsets that themselves form 
fields. Take, for example, the set Q(i) of all complex 
numbers of the form a + bi for which a and b are rational. 
This is a subset of C and is also a field. To show this, one 
must prove that Q(i) is closed under addition, multipli- 
cation, and the taking of inverses. That is, if z and w 
are elements of Q(i), then z + w and zw must be as 
well, as must -z and l/z (this last requirement apply- 
ing only when z 4 0). Axioms such as the commutativity 
and associativity of addition and multiplication are then 
true in Q(i) for the simple reason that they are true in 
the larger set C. 

Even though Q(i) is contained in C, it is a more inter- 
esting field in some important ways. But how can this 
be? Surely, one might think, an object cannot become 
more interesting when most of it is taken away. But a 
moment’ s further thought shows that it certainly can: 
for example, the set of all prime numbers contains fas- 
cinating mysteries of a kind that one does not expect 
to encounter in the set of all positive integers. As for 

fields, THE FUNDAMENTAL THEOREM OF ALGEBRA [V.15] 
tells us that every polynomial equation has a solution in 
C. This is very definitely not true in Q(i). So in Q(i), and 
in many other fields of a similar kind, we can ask which 
polynomial equations have solutions. This turns out to 
be a deep and important question that simply does not 
arise in the larger field C. 

In general, given an example X of an algebraic struc- 
ture, a substructure of X is a subset Y that has rele- 
vant closure properties. For instance, groups have sub- 
groups, vector spaces have subspaces, rings have sub- 
rings (and also ideals [m.82]), and so on. If the property 
defining the substructure Y is a sufficiently interesting 
one, then Y may well be significantly different from X 
and may therefore be a useful addition to one’s stock of 
examples. 

This discussion has focused on algebra, but interest- 
ing substructures abound in analysis and geometry as 
well. For example, the plane R 2 is not a particularly inter- 
esting set, but it has subsets, such as the mandelbrot 
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set [IV. 15 §2.8], to give just one example, that are still 
far from fully understood. 

3.2 Products 

Let G and H be two groups. The product group GxH has 
as its elements all pairs of the form (g, h) such that g 
belongs to G and h belongs to H. This definition shows 
how to build the elements of GxH out of the elements of 
G and the elements of IT. But to define a group we need 
to do more: we are given binary operations on G and 
H and we must use them to build a binary operation on 
GxH.lfgi and g> are elements of G, let uswrite gig2 for 
the result of applying G ' s binary operation to them, as is 
customary, and let us do the same for TT. Then there is 
an obvious binary operation we can define on the pairs, 
namely 

(gi,hi)(g2,h 2 ) = (gig 2 ,h\h 2 ). 

That is, one applies the binary operation from G to the 
first coordinate and the binary operation from H to the 
second. 

One can form products of vector spaces in a very sim- 
ilar way. If V and W are two vector spaces, then the ele- 
ments of V x W are all pairs of the form (v,w) with v 
in V and w in W. Addition and scalar multiplication are 
defined by the formulas 

(Vi,Wi) + (v 2 ,w 2 ) = ( Vi + V 2 ,Wi + w 2 ) 

and 

A (v,w) = (Av,A w). 

The dimension of the resulting space is the sum of the 
dimensions of V and W. (It is actually more usual to 
denote this space by V © W and call it the direct sum 
of V and W. Nevertheless, it is a product construction.) 

It is not always possible to define product structures 
in this simple way. For example, if Fi and F2 are two 
helds, we might be tempted to define a “product held” 
Fi x F2 using the formulas 

(xi,yi) + (x 2 ,y 2 ) = (xi + x 2 ,yi +yi) 

(Xi,yj)(x 2 ,y 2 ) = (xix 2 ,yiy 2 ). 

However, with this definition we do not obtain a held. 
Most of the axioms hold, including the existence of addi- 
tive and multiplicative identities— they are (0,0) and 
(1,1), respectively— but the nonzero element (1,0) does 
not have a multiplicative inverse, since (1,0 )(x,y) = 
(x, 0), which can never equal (1,1). 

Occasionally we can dehne more comphcated binary 
operations that do make the set Fi x F2 into a held. For 


instance, if Fi = F2 = R, then we can dehne addition as 
above, but dehne multiplication in a less obvious way as 
follows: 

(xi,yi)(x 2 ,y 2 ) = (xix 2 - yiy 2 ,xiy 2 + x 2 yi). 
Then we obtain €, the held of complex numbers, since 
the pair (x,y) can be identihed with the complex num- 
ber x + i y. However, this is not a product held in the 
general sense we are discussing. 

Returning to groups, what we dehned earlier was the 
direct product of G and FI. However, there are other, 
more comphcated products of groups, which can be 
used to give a much richer supply of examples. To illus- 
trate this, let us consider the dihedral group D 4 , which is 
the group of all symmetries of a square, of which there 
are eight. If we let R stand for one of the reflections and 
T for a counterclockwise quarter turn, then every sym- 
metry can be written in the form T i R’, where i is 0, 1, 
2, or 3 and j is 0 or 1. (Geometrically, this says that you 
can produce any symmetry by either rotating through a 
multiple of 90° or rehecting and then rotating.) 

This suggests that we might be able to regard D 4 as 
a product of the group {T, T, T 2 , T 3 }, consisting of four 
rotations, with the group {T,R}, consisting of the iden- 
tity I and the reflection R. We could even write (T\ Ri) 
instead of T i R->. However, we have to be careful. For 
instance, (TR)(TR) does not equal T 2 R 2 = T 2 but I. 
The correct rule for multiplication can be deduced from 
the faet that RTR = T -1 (which in geometrical terms is 
saying that if you reflect the square, rotate it counter- 
clockwise through 90°, and reflect back, then the result 
is a clockwise rotation through 90°). It turns out to be 
(T i ,Rj)(T i ',R j 'lt = (T i ~ i ',R j+j '). 

For example, the product of ( T,R ) with (T 3 , R ) is T~ 2 R 2 , 
which equals T 2 . 

This is a simple example of a “semi-direct product” of 
two groups. In general, given two groups G and TT, there 
may be several interesting ways of defining a binary 
operation on the set of pairs ( g,h ), and therefore several 
potentially interesting new groups. 

3.3 Quotients 

Let us write Q[x] for the set of all polynomials in the 
variable x with rational coefficients: that is, expressions 
like 2x 4 - \x + 6. Any two such polynomials can be 
added, subtracted, or multiplied together and the result 
willbe another polynomial. This makes Q[x] into a com- 
mutative ring, but not a held, because if you divide one 
polynomial by another then the result is not (necessarily) 
a polynomial. 
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We will now convert 0[x] into a field in what may at 
first seem a rather strange way: by regarding the polyno- 
mialx 3 -x- 1 as “equivalent” to the zero polynomial. To 
put this another way, whenever a polynomial involves x 3 
we will allow ourselves to replace x 3 by x + 1 , and we will 
regard the new polynomial that results as equivalent to 
the old one. For example, writing for “is equivalent 
to”: 

X 5 = X 3 X 2 ~ (x + l)x 2 = X 3 + X 2 

~x+l + x 2 =x 2 +x+l. 

Notice that in this way we can convert any polynomial 
into one of degree at most 2, since whenever the degree 
is higher, you can reduce it by taking out x 3 from the 
term of highest degree and replacing it by x + 1, just as 
we did above. 

Notice also that whenever we do such a replacement, 
the difference between the old polynomial and the new 
one is a multiple of x 3 - x - 1. For example, when we 
replaced x 3 x 2 by (x + l)x 2 the difference was (x 3 - 
x - l)x 2 . Therefore, what our process amounts to is 
this: two polynomials are equivalent if and only if their 
difference is a multiple of the polynomial x 3 - x - 1. 

Now the reason Q[x] was not a held was that noncon- 
stant polynomials do not have multiplicative inverses. 
For example, it is obvious that one cannot multiply x 2 
by a polynomial and obtain the polynomial 1. However, 
we can obtain a polynomial that is equivalent to 1 if we 
multiply by 1 + x - x 2 . Indeed, the product of the two is 

X 2 + X 3 - X 4 ~ X 2 + X + 1 - (x + l)x = 1. 

It turns out that all polynomials that are not equivalent 
to zero (that is, are not multiples of x 3 - x - 1) have mul- 
tiplicative inverses in this generalized sense. (To find an 
inverse for a polynomial P one applies the generalized 
euclid algorithm [III.22] to find polynomials Q and R 
such that PQ + R(x 3 - x - 1) = 1. The reason we obtain 
1 on the right-hand side is that x 3 - x - 1 cannot be 
factorized in Q[x] and P is not a multiple of x 3 - x - 1, 
so their highest common factor is 1. The inverse of P is 
then Q.) 

In what sense does this mean that we have a held? 
After all, the product of x 2 and 1 +x-x 2 was not 1 : it was 
merely equivalent to 1. This is where the notion of quo- 
tients comes in. We simply decide that when two poly- 
nomials are equivalent, we will regard them as equal, 
and we denote the resulting mathematical structure by 
<Q>[x]/(x 3 - x - 1). This structure turns out to be a 
held, and it turns out to be important as the smallest 
held that contains Q and also has a root of the poly- 
nomial X 3 - X - 1. What is this root? It is simply x. 


This is a slightly subtle point because we are now think- 
ing of polynomials in two different ways: as elements 
of Q[x]/(x 3 - x - 1) (at least when equivalent ones 
are regarded as equal), and also as functions defined on 
<Q> [x ] / (x 3 - x - 1 ) . So the polynomial X 3 - X - 1 is not the 
zero polynomial, since for example it takes the value 5 
whenY = 2 and the value x 6 -x 2 -l ~ (x+l) 2 -x 2 -l ~ 
2x when X = x 2 . 

You may have noticed a strong similarity between the 
discussion of the held Q[x]/(x 3 - x - 1) and the dis- 
cussion of the held Q(y) at the end of section 2.2. And 
indeed, this is no coincidence: they are two different 
ways of describing the same held. However, thinking of 
the held as Q/ (x 3 - x - 1) brings signihcant advantages, 
as it converts questions about a mysterious set of com- 
plex numbers into more approachable questions about 
polynomials. 

What does it mean to “regard two mathematical 
objects as equal" when they are not equal? A formal 
answer to this question uses the notion of equivalence 
relations and equivalence classes (discussed in the lan- 
guage AND GRAMMAR OF MATHEMATICS [1.2 §2.3]): One 
says that the elements of Q[x]/(x 3 - x - 1) are not in 
faet polynomials but equivalence classes of polynomials. 
However, to understand the notion of a quotient it is 
mueh easier to look at an example with which we are 
all familiar, namely the set O of rational numbers. If we 
are trying to explain carefully what a rational number is, 
then we may start by saying that a typical rational num- 
ber has the form a/b, where a and b are integers and b 
is not 0. And it is possible to dehne the set of rational 
numbers to be the set of all such expressions, with the 
rules 

a c _ ad + bc 

b + d~ bd 

and 

bd = bd ' 

However, there is one very important further remark 
we must make, which is that we do not regard all such 
expressions as different: for example, \ and | are sup- 
posed to be the same rational number. So we define two 
expressions f and | to be equivalent if ad = bc and 
we regard equivalent expressions as denoting the same 
number. Notice that the expressions can be genuinely 
different, but we think of them as denoting the same 
object. 

If we do this, then we must be careful whenever we 
define functions and binary operations. For example, 
suppose we tried to define a binary operation “o” on Q 



1.3. Some Fundamental Mathematical Definitions 


25 


by the natural-looking formula 

b°d = bTd' 

This definition turns out to have a very serious flaw. To 
see why, let us apply it to the fractions \ and . Then it 
gives us the answer § . Now let us replace | by the equiv- 
alent fraction | and apply the formula again. This time it 
gives us the answer | , which is different. Thus, although 
the formula defines a perfectly good binary operation on 
the set of expressions of the form |, it does not make 
any sense as a binary operation on the set of rational 
numbers. 

In general, it is essential to check that if you put equiv- 
alent objects in then you get equivalent objects out. For 
example, when defining addition and multiplication for 
the field Q[x]/(x 3 -x - 1), one must check that if P and 
P 1 differ by a multiple of x 3 - x - 1, and Q and Q' also 
differ by a multiple of x 3 - x - 1, then so do P + O and 
P' + Q ! , and so do PO and P'Q! . This is an easy exercise. 

Why is the word “quotient” used? Well, a quotient is 
normally what you get when you divide one number 
by another, so to understand the analogy let us think 
about dividing 21 by 3. We can think of this as divid- 
ing up twenty-one objects into sets of three objects 
each and asking how many sets we get. This can be 
described in terms of equivalence as follows. Let us call 
two objects equivalent if they belong to the same one of 
the seven sets. Then there canbe at most seven inequiv- 
alent objects. So when we regard equivalent objects as 
the same, we “divide out by the equivalence,” obtaining 
a “quotient set” that has seven elements. 

A rather different use of quotients leads to an elegant 
definition of the mathematical shape known as a torus: 
that is, the shape of the surface of a doughnut (of the 
kind that has a hole). We start with the plane, R 2 , and 
define two points (x,y) and (x',y') to be equivalent if 
x - x' and y - y' are both integers. Suppose that we 
regard any two equivalent points as the same and that 
we start at a point (x,y) and move right until we reach 
the point (x + 1 ,y). This point is “the same" as (x,y), 
since the difference is (1,0). Therefore, it is as though 
the entire plane has been wrapped around a vertical 
cylinder of circumference 1 and we have gone around 
this cylinder once. If we now apply the same argument 
to the y-coordinate, noting that (x,y) is always “the 
same” point as (x, y + 1), then we find that this cylinder 
is itself “folded around” so that if you go “upwards” by 
a distance of 1 then you get back to where you started. 
But that is what a torus is: a cylinder that is folded back 
into itself. (This is not the only way of defining a torus, 


however. For example, it can be defined as the product 
of two circles.) 

Many other important objects in modern geometry are 
defined using quotients. It often happens that the object 
one starts with is extremely big, but that at the same time 
the equivalence relation is very generous, in the sense 
that it is easy for one object to be equivalent to another. 
In that case the number of “genuinely distinct” objects 
can be quite small. This is a rather loose way of talking, 
since it is not really the number of distinct objects that is 
interesting so much as the complexity of the set of these 
objects. It might be better to say that one often starts 
with a hopelessly large and complicated structure but 
“divides out most of the mess” and ends up with a quo- 
tient object that has a structure that is simple enough 
to be manageable while still conveying important infor- 
mation. Good examples of this are the fundamental 
group [IV.10 §3] and the homology and cohomology 
groups [IV.10 §2] of a topological space; an even better 
example is the notion of a moduli space [IV.8]. 

Many people find the idea of a quotient somewhat dif- 
ficult to grasp, but it is of major importance throughout 
mathematics, which is why it has been discussed at some 
length here. 

4 Functions between Algebraic Structures 

One rule with almost no exceptions is that mathemat- 
ical structures are not studied in isolation: as well as 
the structures themselves one looks at certain functions 
defined on those structures. In this section we shall see 
which functions are worth considering, and why. (For a 
discussion of functions in general, see the language 
AND GRAMMAR OF MATHEMATICS [1.2 §2.2].) 

4.1 Homomorphisms, Isomorphisms, and 
Automorphisms 

If X and Y are two examples of a particular mathemat- 
ical structure, such as a group, field, or vector space, 
then, as was suggested in the discussion of symmetry in 
section 2.1, there is a class of functions from X to Y of 
particular interest, namely the functions that “preserve 
the structure.” Roughly speaking, a function / : X — Y is 
said to preserve the structure of X if, given any relation- 
ship between elements of X that is expressed in terms 
of that structure, there is a corresponding relationship 
between the images of those elements that is expressed 
in terms of the structure of Y. For example, if X and Y 
are groups and a, b, and c are elements of X such that 
ab = c, then, if / is to preserve the algebraic structure 
of X, f(a)f(b) must equal f(c) in Y . (Here, as is usual, 
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we are using the same notation for the binary opera- 
tions that make X and Y groups as is normally used 
for multiplication.) Similarly, if X and Y are fields, with 
binary operations that we shall write using the standard 
notation for addition and multiplication, then a function 
f : X — Y will be interesting only if f(a) + f(b) = f(c) 
whenever a + b = c, and f(a)f(b) = /(c) whenever 
ab = c. For vector spaces, the functions of interest are 
ones that preserve linear combinations: if V and W are 
vector spaces, then f(av + bw) should always equal 
af(v) + bf(w). 

A function that preserves structure is generally known 
as a homomorphism, though homomorphisms of par- 
ticular mathematical structures often have their own 
names: for example, a homomorphism of vector spaces 
is called a linear map. 

There are some useful properties that a homomor- 
phism may have if we are lucky. To see why further prop- 
erties can be desirable, consider the following example. 
Let X and Y be groups and let / : X — Y be the function 
that takes every element of X to the identity element 
e of Y. Then, according to the definition above, / pre- 
serves the structure of X, since whenever ab = c, we 
have f(a)f(b) = ee = e = f(c). However, it seems 
more accurate to say that / has collapsed the struc- 
ture. One can make this idea more precise: although 
f(a)f(b) = f(c) whenever ab = c, the converse does 
not hold', it is perfectly possible for f(a)f(b) to equal 
/(c) without ab equaling c, and indeed that happens in 
the example just given. 

An isomorphism between two structures X and Y is a 
homomorphism / : X — Y that has an inverse g :Y — X 
that is also a homomorphism. For most algebraic struc- 
tures, if / has an inverse g, then g is automatically a 
homomorphism; in such cases we can simply say that 
an isomorphism is a homomorphism that is also a bijec- 
tion [1.2 §2.2]. That is, / is a one-to-one correspondence 
between X and Y that preserves structure . 1 

If X and Y are fields, then these considerations are 
less interesting: it is a simple exercise to show that every 
homomorphism / : X — Y is automatically an isomor- 
phism between X and its image f (X), that is, the set of 
all values taken by the function /. So structure cannot 


1. Let us see how this claim is proved for groups. If X and Y are 
groups, / : X — Y is a homomorphism with inverse g : Y — X and 
u, v , and u> are elements of Y with uv = w, then we must show that 
g(u)g(v) = g(w). To do this, let a = g(u), b = giv), and d = g(w). 
Since / and g are inverse functions, f(a) = u, f(b) = v, and f(d ) = 
w. Now let c = ab. Then w = uv = f(a)f(b) = /(c), since / is a 
homomorphism. But then /(c) = f(d), which implies that c = d (just 
apply the function g to /(c) and f (d)). Therefore ab = d, which tells 
us that g(u)g(v) = g(w), as we needed to show. 


be collapsed without being lost. (The proof depends on 
the faet that the zero in Y has no multiplicative inverse.) 

In general, if there is an isomorphism between two 
algebraic structures X and Y, then X and Y are said to 
be isomorphic (coming from the Greek words for “same” 
and “shape”). Loosely, the word “isomorphic” means 
“the same in all essential respects,” where what counts 
as essential is precisely the algebraic structure. What is 
absolutely not essential is the nature of the objects that 
have the structure: for example, one group might consist 
of certain complex numbers, another of integers modulo 
a prime p, and a third of rotations of a geometrical fig- 
ure, and they could all turn out to be isomorphic. The 
idea that two mathematical constructions can have very 
different constituent parts and yet in a deeper sense be 
“the same” is one of the most important in mathematics. 

An automorphism of an algebraic structure X is an iso- 
morphism from X to itself. Since it is hardly surprising 
that X is isomorphic to itself, one might ask what the 
point is of automorphisms. The answer is that automor- 
phisms are precisely the algebraic symmetries alluded 
to in our discussion of groups. An automorphism of X 
is a function from X to itself that preserves the struc- 
ture (which now comes in the form of statements like 
ab = c). The composition of two automorphisms is 
clearly a third, and as a result the automorphisms of a 
structure X form a group. Although the individual auto- 
morphisms may not be of mueh interest, the group cer- 
tainly is, as it often encapsulates what one really wants 
to know about a structure X that is too complicated to 
analyze direetly. 

A spectacular example of this is when X is a field. 
To illustrate, let us take the example of Q(V2). If / : 
Q(y/2) — Q(y/2) is an automorphism, then f(l) = 1, as 
we have seen, and then/(2) = /(I + 1) = /( 1) +/(1) = 
1 + 1 = 2. Continuing like this, we can show that f(n) = 
n for every positive integer n. Then f(ri) + f(-n) = 
f(n + (-n)) = /(O) = 0, so /(-n) = -/(n) = -n. 
Finally, f(p/q) = f(p)/f(q ) = p/q when p and q 
are integers with q ± 0. So / takes every rational 
number to itself. What can we say about /(V2)? Well, 
/(V2)/(V2) = /(V 2 ■ y/2) = /( 2) = 2, but this implies 
only that /(V 2) is y/2 or -y/2. It turns out that both 
choices are possible: one automorphism is the “trivial” 
one f(a + by/2) = a + by/ 2 and the other is the more 
interesting one f(a + by/2) = a - by/2. This observa- 
tion demonstrates that there is no algebraic difference 
between the two square roots; in this sense, the field 
Q(V2) does not know which square root of 2 is positive 
and which negative. These two automorphisms form a 
group, which is isomorphic to the group consisting of 



1.3. Some Fundamental Mathematical Definitions 


27 


the elements ±1 under multiplication, or the group of 
integers modulo 2, or the group of symmetries of an 
isosceles triangle that is not equilateral, ofsUs. The list 
is endless. 

The automorphism groups associated with certain 
held extensions are called galois groups [III. 30], and 
are a vital component of the proof of the insolubil- 
ity of the quintic [V.24], as well as of large parts 
of algebraic number theory (see algebraic numbers 
HV.3I). 

4.2 Linear Maps and Matrices 

Homomorphisms between vector spaces have a distinc- 
tive geometrical property: they send straight lines to 
straight lines. For this reason they are called linear maps, 
as was mentioned in the previous subsection. From a 
more algebraic point of view, the structure that linear 
maps preserve is that of linear combinations: a function 
/ from one vector space to another is a linear map if 
f(au + bv) = af(u) + bf(v) for every pair of vectors 
m, v g V and every pair of scalars a and b. From this 
one can deduce the more general assertion that f(a\ vi + 
■ ■ -+a n v n ) is always equal to a\f(v\)+ ■ ■ ■+a n f(v n ). 

Suppose that we wish to define a linear map from V to 
W. How much information do we need to provide? This 
may seem a vague question, so here is a similar one. How 
much information is needed to specify a point in space? 
The answer is that, once one has devised a sensible coor- 
dinate system, three numbers will suffice. If the point is 
not too far from Earth’s surface then one might wish 
to use its latitude, its longitude, and its height above 
sea level, for instance. Can a linear map from V' to W 
similarly be specified by just a few numbers? 

The answer is that it can, at least if V and W are hnite 
dimensional. Suppose that V has a basis v\,...,v n , that 
W has a basis uq , . . . , w m , and that / : V — W is the 
linear map we would like to specify. Since every vector in 
V canbe written in the form a\ tq + ■ ■ • + a n v n and since 
fiaivi + ■ ■ ■ +a n v n ) is always equal to a\f(v\) + ■ • • + 
a. n f(v n ), oncewe decide what f(vi), . . . ,f(v n ) are we 
have specified / completely. But each vector f(Vj) is a 
linear combination of the basis vectors uq , . . . , w m : that 
is, it can be written in the form 

f(Vi) = a.ijWi + ■ ■ ■ + a m jW m . 

Thus, to specify an individual f(vj) needs m numbers, 
the scalars ay, . . . , a m j . Since there are n different vec- 
tors Vj, the linear map is determined by the mn num- 
bers Ui j, where i runs from 1 to m and j from 1 to n. 


These numbers can be written in an array, as follows: 
li a. 12 ••• ai„\ 

0-21 0 , 22 ■ ■ ■ «2 n 

Wml a. m 2 ■ ■ ■ O-rnnJ 

An array like this is called a matrix. It is important to 
note that a different choice of basis vectors for V and 
W would lead to a different matrix, so one often talks of 
the matrix of / relative to a given pair of bases (a basis 
for V and a basis for W). 

Now suppose that / is a linear map from V to W and 
that g is a linear map from U to V. Then fg stands for 
the linear map from U to W obtained by doing first g, 
then /. If the matrices of / and g, relative to certain 
bases of U, V, and W, are A and B, then what is the 
matrix of fg? To work it out, one takes a basis vector 
Uk of U and applies to it the function g, obtaining a lin- 
ear combination iq fc tq + • • ■ + b nk v n of the basis vectors 
of V. To this linear combination one applies the function 
/, obtaining a rather complicated linear combination 
of linear combinations of the basis vectors uq , . . . , w m 
ofW. 

Pursuing this idea, one can calculate that the entry in 
row i and column j of the matrix P of fg is aubij + 
ai'ib'ij + ■ ■ ■ + ai n b n j. This matrix P is called the prod- 
uct of A and B and is written AB. If you have not seen 
this definition then you will find it hard to grasp, but the 
main point to remember is that there is a way of calculat- 
ing the matrix for fg from the matrices 4, B of f and g, 
and that this matrix is denoted AB. Matrix multiplication 
of this kind is associative but not commutative. That is, 
A(BC) is always equal to (AB)C but AB is not necessar- 
ily the same as BA. The associativity follows from the 
faet that composition of the underlying linear maps is 
associative: if A, B, and C are the matrices of /, g, and 
h, respectively, then A(BC) is the matrix of the linear 
map “do h-then-g, then /” and ( AB)C is the matrix of 
the linear map “do h, then g-then-f,” and these are the 
same linear map. 

Let us now confine our attention to automorphisms 
from a vector space V to itself. These are linear maps / : 
V — V that can be inverted; that is, for which there exists 
a linear map g : V — V such that fg(v ) = gf(v) = v 
for every vector v in V. These we can think of as “sym- 
metries” of the vector space V, and as such they form 
a group under composition. If V is n dimensional and 
the scalars come from the held F, then this group is 
called GL„(F). The letters “G” and “L” stand for “gen- 
eral" and “linear”; some of the most important and dif- 
ficult problems in mathematics arise when one tries to 
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understand the structure of the general linear groups 
(and related groups) for certain interesting fields F (see 
REPRESENT ATION THEORY [IV. 12]). 

While matrices are very useful, many interesting linear 
maps are between infinite-dimensional vector spaces, 
and we close this section with two examples for the 
reader who is familiar with elementary calculus. (There 
will be a brief discussion of calculus later in this arti- 
cle.) For the first, let V be the set of all functions from 
R to R that can be differentiated and let W be the set 
of all functions from R to R. These can be made into 
vector spaces in a simple way: if / and g are func- 
tions, then their sum is the function h defined by the 
formula h(x) = fix') + g(x), and if a is a real num- 
ber then af is the function k defined by the formula 
k(x) = af(x). (So, for example, we could regard the 
polynomial x 2 + 3x + 2 as a linear combination of the 
functions x 2 , x, and the constant function 1.) Then dif- 
ferentiation is a linear map (from V to W), since the 
derivative (af + bg)’ is af + bg' . This is clearer if we 
write D / for the derivative of /: then we are saying that 
V(af + bg) = aDf+bDg. 

A second example uses integration. Let V be another 
vector space of functions, and let u be a function of fwo 
variables. (The functions involved have to have certain 
properties for the definition to work, but let us ignore 
the technicalities.) Then we can define a linear map T on 
the space V by the formula 

(T/)(x) = J u(x,y)f(y ) d y. 

Definitions like this one can be hard to take in, because 
they involve holding in one’s mind three different lev- 
els of complexity. At the bottom we have real numbers, 
denoted by x and y . In the middle are functions like /, 
u, and Tf, which turn real numbers (or pairs of them) 
into real numbers. At the top is another function, T, 
but the “objects” that it transforms are themselves func- 
tions: it turns a function like / into a different function 
Tf. This is just one example where it is important to 
think of a function as a single, elementary “thing” rather 
than as a process of transformation. (See the discussion 
of functions in the language and grammar of math- 
ematics [1.2 §2.2].) Another remark that may help to 
clarify the definition is that there is a very close analogy 
between the role of the two-variable function u(x,y) 
and the role of a matrix ay (which can itself be thought 
of as a function of the two integer variables i and j). 
Functions like u are sometimes called kemels. For more 
about linear maps between infinite-dimensional spaces, 
see OPERATOR ALGEBRAS [IV. 19] and LINEAR OPERATORS 
[m.52]. 


4.3 Eigenvalues and Eigenvectors 

Let V be a vector space and let S : V — V be a linear 
map from V to itself. An eigenvector of S is a nonzero 
vector v in V such that Sv is proportional to v; that 
is, Sv = Åv for some scalar A. The scalar in question 
is called the eigenvalue corresponding to v. This sim- 
ple pair of definitions is extraordinarily important: it 
is hard to think of any branch of mathematics where 
eigenvectors and eigenvalues do not have a major part 
to play. But what is so interesting about Sv being pro- 
portional to v? A rather vague answer is that in many 
cases the eigenvectors and eigenvalues associated with 
a linear map contain all the information one needs about 
the map, and in a very convenient form. Another answer 
is that linear maps occur in many different contexts, and 
questions that arise in those contexts often turn out to 
be questions about eigenvectors and eigenvalues, as the 
following two examples illustrate. 

First, imagine that you are given a linear map T 
from a vector space V to itself and want to understand 
what happens if you perform the map repeatedly. One 
approach would be to pick a basis of V, work out the cor- 
responding matrix A of T and calculate the powers of A 
by matrix multiplication. The trouble is that the calcu- 
lation will be messy and uninformative, and it does not 
really give much insight into the linear map. 

However, it often happens that one can pick a very 
special basis, consisting only of eigenvectors, and in 
that case understanding the powers of T becomes easy. 
Indeed, suppose that the basis vectors are Vi,V2, ■ ■ ■ ,v n 
and that each ty is an eigenvector with corresponding 
eigenvalue A<. That is, suppose that T(v{) = A,Uj for 
every i. If w is any vector in V, then there is exactly one 
way of writing it in the form a\ Vi + ■ ■ ■ + a„v n , and then 

T(w ) = AiaiVi + ■ ■ • + A n a„v„. 

Roughly speaking, this says that T stretches the part of 
w in direction v; by a factor of A,. But now it is easy 
to say what happens if we apply T not just once but m 
times to w. The result will be 

T m (w) = A”aii»i + • • • + A™a n v n - 
In other words, now the amount by which we stretch in 
the vi direction is Af, and that is all there is to it. 

Why should one be interested in doing linear maps 
over and over again? There are many reasons, but one 
fairly convincing one is that this sort of calculation is 
exactly what Google does in order to put Web sites into a 
useful order. Details can be found in the mathematics 
OF ALGORITHM DESIGN [VIL 5]. 
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The second example concerns the interesting property 
of the exponential function [III.25] e x : that its deriva- 
tive is the same function. In other words, if f(x) = e x , 
then f'(x) = f(x). Now differentiation, as we saw ear- 
lier, can be thought of as a linear map, and if f'(x) = 
f(x) then this map leaves the function / unchanged, 
which says that / is an eigenvector with eigenvalue 1. 
More generally, if g(x) = e Ax , then g' (x) = Ae Ax = 
A g(x), so g is an eigenvector of the differentiation map, 
with eigenvalue A. Many linear differential equations 
can be thought of as asking for eigenvectors of lin- 
ear maps defined using differentiation. (Differentiation 
and differential equations will be discussed in the next 
section.) 

5 Basic Concepts of Mathematical Analysis 

Mathematics took a huge leap forward in sophistication 
with the invention of calculus, and the notion that one 
can specify a mathematical object indirectly by means of 
better and better approximations. These ideas form the 
basis of a broad area of mathematics known as analysis, 
and the purpose of this section is to help the reader who 
is unfamiliar with them. However, it will not be possible 
to do full justice to the subject, and what is written here 
will be hard to understand without at least some prior 
knowledge of calculus. 

5.1 Limits 

In our discussion of real numbers (section 1.4) there 
was a brief discussion of the square root of 2. How do 
we know that 2 has a square root? One answer is the 
one given there: that we can calculate its decimal expan- 
sion. If we are asked to be more precise, we may well 
end up saying something like this. The real numbers 1, 
1.4, 1.41, 1.414, 1.4142, 1.41421, . . . , which have termi- 
nating decimal expansions (and are therefore rational) 
approach another number x = 1.4142135 ....We can- 
not actually write down x properly because it has an 
infinite decimal expansion but we can at least explain 
how its digits are defined: for example, the third digit 
after the decimal point is a 4 because 1.414 is the largest 
multiple of 0.001 that squares to less than 2. It follows 
that the squares of the original numbers, 1, 1.96, 1.9881, 
1.999396, 1.99996164, 1.9999899241, ..., approach 2, 
and this is why we are entitled to say that x 2 = 2. 

Suppose that we are asked to determine the length of 
a curve drawn on a piece of paper, and that we are given 
a ruler to help us. We face a problem: the ruler is straight 
and the curve is not. One way of tackling the problem is 
as follows. First, draw a few points Po, Pi, P2, ■ ■ ■ , P« along 


the curve, with Po at one end and P„ at the other. Next, 
measure the distance from Po to Pi, the distance from Pi 
to P2, and so on up to P n . Finally, add all these distances 
up. The result will not be an exactly correct answer, but if 
there are enough points, spaced reasonably evenly, and 
if the curve does not wiggle too much, then our pro- 
cedure will give us a good notion of the “approximate 
length” of the curve. Moreover, it gives us a way to define 
what we mean by the “exact length”: suppose that, as 
we take more and more points, we find that the approx- 
imate lengths, in the sense just defined, approach some 
number l. Then we say that l is the length of the curve. 

In both these examples, there is a number that we 
reach by means of better and better approximations. 
I used the word “approach” in both cases, but this is 
rather vague, and it is important to make it precise. Let 
ai,a 2 ,as, . .. be a sequence of real numbers. What does 
it mean to say that these numbers approach a specihed 
real number 1? 

The following two examples are worth bearing in 

mind. The first is the sequence 5 , | , | , f , In a sense, 

the numbers in this sequence approach 2, since each one 
is doser to 2 than the one before, but it is clear that this 
is not what we mean. What matters is not so much that 
we get doser and doser, but that we get arbitrarily close, 
and the only number that is approached in this stronger 
sense is the obvious “limit,” 1. 

A second sequence illustrates this in a different way: 

1,0, 5,0, j , 0, 5,0 Here, we would like to say that the 

numbers approach 0, even though it is not true that each 
one is doser than the one before. Nevertheless, it is true 
that eventually the sequence gets as close as you like to 
0 and remains at least that close. 

This last phrase serves as a definition of the mathe- 
matical notion of a limit: the limit of the sequence of 
numbers a \ , «2, «3, ■ ■ ■ is l if eventually the sequence 
gets as close as you like to l and remains that close. 
However, in order to meet the standards of precision 
demanded by mathematics, we need to know how to 
translate English words like “eventually” into mathemat- 
ics, and for this we need quantifiers [1.2 §3.2]. 

Suppose 6 is a positive number (which one usually 
imagines as small). Let us say that a n is 5-close to l if 
| a n - i I , the difference between a n and l, is less than 5. 
What would it mean to say that eventually the sequence 
gets 5-close to l and stays there? It means that from 
some point onwards, all the a n are 5-close to l. And what 
is the meaning of “from some point onwards”? It is that 
there is some number N (the point in question) with the 
property that a n is 5-close to l from N onwards— that is, 
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for every n that is greater than or equal to N. In symbols: 

3N Vn^N a„ is 5-close to l. 

It remains to capture the idea of “as close as you like.” 
What this means is that the above sentence is true for 
any S you might wish to specify. In symbols: 

V5 > 0 3N Mn^N a n is 5-close to I. 
Finally, let us stop using the nonstandard phrase “5- 
close”: 

V5 > 0 3JV Vn^iV \a„-l\<6. 

This sentence is not particularly easy to understand. 
Unfortunately (and interestingly in the light of the dis- 
cussion in [1.2 §4]), using a less symbolic language does 
not necessarily make things much easier: “Whatever pos- 
itive S you choose, there is some number N such that for 
all bigger numbers n the difference between a n and l is 
less than 5.” 

The notion of limit applies much more generally than 
just to real numbers. If you have any collection of math- 
ematical objects and can say what you mean by the dis- 
tance between any two of those objects, then you can 
talk of a sequence of those objects having a limit. Two 
objects are now called 5-close if the distance between 
them is less than 5, rather than the difference. (The 
idea of distance is discussed further in metric spaces 
[HI. 5 8].) For example, a sequence of points in space can 
have a limit, as can a sequence of functions. (In the sec- 
ond case it is less obvious how to define distance— there 
are many natural ways to do it.) A further example comes 
in the theory of fractals (see Dynamics [IV. 15]): the very 
complicated shapes that appear there are hest defined 
as limits of simpler ones. 

Other ways of saying that the limit of the sequence 
a\ , ci2, . . . is l are to say that a n converges to l or that it 
tends to l. One sometimes says that this happens as n 
tends to infinity. Any sequence that has a limit is called 
convergent. If a n converges to l then one often writes 
a n - l. 

5.2 Continuity 

Suppose you want to know the approximate value of tt 2 . 
Perhaps the easiest thing to do is to press a tt button 
on a calculator, which displays 3.1415927, and then an 
x 2 button, after which it displays 9.8696044. Of course, 
one knows that the calculator has not actually squared 
tt: instead it has squared the number 3.1415927. (If it is 
a good one, then it may have secretly used a few more 
digits of tt without displaying them, but not infmitely 
many.) Why does it not matter that the calculator has 
squared the wrong number? 


A first answer is that it was only an approximate value 
of tt 2 that was required. But that is not quite a complete 
explanation: how do we know that if x is a good approx- 
imation to tt then x 2 is a good approximation to tt 2 ? 
Here is how one might show this. If x is a good approx- 
imation to tt, then we can write x = tt + 6 for some 
very small number 5 (which could be negative). Then 
x 2 = tt 2 + 25tt + 5 2 . Since 5 is small, so is 25tt + 5 2 , so 
x 2 is indeed a good approximation to tt 2 . 

What makes the above reasoning work is that the func- 
tion that takes a number x to its square is continuous. 
Roughly speaking, this means that if two numbers are 
close, then so are their squares. 

To be more precise about this, let us return to the cal- 
culation of tt 2 , and imagine that we wish to work it out 
to a much greater accuracy— so that the first hundred 
digits after the decimal point are correct, for example. 
A calculator will not be much help, but what we might 
do is find a list of the digits of tt (on the Internet you 
can find sites that tell you at least the first fifty million), 
use this to define a new x that is a much better approx- 
imation to tt, and then calculate the new x 2 by getting 
a computer to do the necessary long multiplication. 

How close do we need x to be to tt for x 2 to be within 
10 -100 of tt 2 ? To answer this, we can use our earlier 
argument. Let x = TT + 5again. Thenx 2 -TT 2 = 25tt + 5 2 , 
and an easy calculation shows that this has modulus less 
than 10 -100 if 5 has modulus less than 10 -101 . So we will 
be all right if we take the first 101 digits of tt after the 
decimal point. 

More generally, however accurate we wish our esti- 
mate of tt 2 to be, we can achieve this accuracy if we are 
prepared to make x a sufficiently good approximation 
to tt. In mathematical parlance, the function /(x) = x 2 
is continuous at tt. 

Let us try to say this more symbolically. The state- 
ment “x 2 = tt 2 to within an accuracy of e” means that 
|x 2 -tt 2 | < e. To capture the phrase “however accurate,” 
we need this to be true for every positive e, so we should 
start by saying Ve > 0. Now let us think about the words 
“if we are prepared to make x a sufficiently good approx- 
imation to tt.” The thought behind them is that there is 
some 5 > 0 for which the approximation is guaranteed 
to be accurate to within e as long as x is within 5 of tt. 
That is, there exists a 5 > 0 such that if |x - 5| < tt 
then it is guaranteed that |x 2 - rr 2 1 < e. Putting every- 
thing together, we end up with the following symbolic 
sentence: 

Ve > 0 35 >0 ( | x — tt | < 5 => |x 2 - tt 2 | < e). 

To put that in words: “Given any positive number e there 
is a positive number 5 such that if |x — tt | is less than 5 
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then \x 2 - tt 2 | is less than e.” Earlier, we found a S that 
worked when e was chosen to be lO -100 : it was 10 -101 . 

What we have just shown is that the function fix) = 
x 2 is continuous at the point x = tt . Now let us general- 
ize this idea: let / be any function and let a be any real 
number. We say that / is continuous at a if 

Ve > 0 35 >0 (\x - a\ < 5 => \f(x) - f(a)\ <e). 
This says that however accurate you wish f(x) to be as 
an estimate for f{a), you can achieve this accuracy if 
you are prepared to make x a sufficiently good approx- 
imation to a. The function / is said to be continuous if 
it is continuous at every a. Roughly speaking, what this 
means is that / has no “sudden jumps.” (It also rules out 
certain kinds of very rapid oscillations that would also 
make accurate estimates difficult.) 

As with limits, the idea of continuity applies in much 
more general contexts, and for the same reason. Let f 
be a function from a set X to a set Y (see the language 
AND GRAMMAR OF MATHEMATICS [1.2 §2.2]), and SUppOSe 
that we have two notions of distance, one for elements of 
X and the other for elements of Y. Using the expression 
d(x,a ) to denote the distance between x and a, and 
similarly for d(f{x),f(a)), one says that / is continuous 
at a if 

Ve > 0 35 > 0 (d(x, a) <5 => d(f(x),f(a)) < e) 
and that / is continuous if it is continuous at every a in 
X. In other words, we replace differences such as \x - a\ 
by distances such as d(x, a). 

Continuous functions, like homomorphisms (see sec- 
tion 4.1 above), can be regarded as preserving a certain 
sort of structure. It can be shown that a function / is con- 
tinuous if and only if, whenever a n — x, we also have 
f(a n ) — fix). That is, continuous functions are func- 
tions that preserve the structure provided by convergent 
sequences and their limits. 

5.3 Differentiation 

The derivative of a function / at a value a is usually pre- 
sented as a number that measures the rate of change of 
f(x) as x passes through a. The purpose of this section 
is to promote a slightly different way of regarding it, one 
that is more general and that opens the door to much of 
modern mathematics. This is the idea of differentiation 
as linear approximation. 

Intuitively speaking, to say that f'(a ) = m is to say 
that if one looks through a very powerful microscope 
at the graph of / in a tiny region that includes the point 
( a,f(a )), then what one sees is almost exactly a straight 
line of gradient m. In other words, in a sufficiently small 


neighborhood of the point a, the function / is approxi- 
mately linear. We can even write down a formula for the 
linear function g that approximates /: 

g(x) = f(a) + m(x - a). 

This is the equation of the straight line of gradient m 
that passes through the point ( a,f(a )). Another way of 
writing it, which is a little clearer, is 

g(a + h) = f(a) + mh, 

and to say that g approximates / in a small neighbor- 
hood of a is to say that f(a + h) is approximately equal 
to f(a) + mh when h is small. 

One must be a little careful here: after all, if / does 
not jump suddenly, then, when h is small, f(a + h ) will 
be close to f(a) and mh will be small, so f(a + h) is 
approximately equal to f(a) +mh. This line of reasoning 
seems to work regardless of the value of m, and yet we 
wanted there to be something special about the choice 
m = f'(a). What singles out that particular value is that 
f(a + h) is not just close to f(a) + mh, but the differ- 
ence e(h) = f(a + h) - f(a) - mh is small compared 
with h. That is, e(h)/h — 0 as h — 0. (This is a slightly 
more general notion of limit than that discussed in sec- 
tion 5.1, but can be recovered from it: it is equivalent to 
saying that if you choose any sequence hi, hi, . . . such 
that h n — 0, then e(h n )/h n — ■ 0 as well.) 

The reason these ideas can be generalized is that the 
notion of a linear map is much more general than sim- 
ply a function from R to R of the form g(x) = mx + c. 
Many functions that arise naturally in mathematics— 
and also in science, engineering, economics, and many 
other areas— are functions of several variables, and can 
therefore be regarded as functions defined on a vec- 
tor space of dimension greater than 1. As soon as we 
look at them this way, we can ask ourselves whether, in 
a small neighborhood of a point, they can be approx- 
imated by linear maps. It is very useful if they can: a 
general function can behave in very complicated ways, 
but if it can be approximated by a linear function, then at 
least in small regions of n-dimensional space its behav- 
ior is much easier to understand. In this situation one 
can use the machinery of linear algebra and matrices, 
which leads to calculations that are feasible, especially 
if one has the help of a computer. 

Imagine, for instance, a meteorologist interested in 
how the direction and speed of the wind changes as 
one looks at different parts of some three-dimensional 
region above Earth’s surface. Wind behaves in compli- 
cated, chaotic ways, but to get some sort of handle on 
this behavior one can describe it as follows. To each 



32 


I. Introduction 


point (x,y,z) in the region (think of x and y as horizon- 
tal coordinates and z as a vertical one) one can associate 
a vector (u,v,w) representing the velocity of the wind 
at that point: u, v, and w are the components of the 
velocity in the x-, y-, and z-directions. 

Now let us change the point (x,y,z) very slightly by 
choosing three small numbers h, k, and l and looking at 
(x + h, y + k, z + l). At this new point, we would expect 
the wind vector to be slightly different as well, so let 
us write it (u + p,v + q,w + r). How does the small 
change (p, q, r ) in the wind vector depend on the small 
change (h, k, l) in the position vector? Provided the wind 
is not too turbulent and h, k, and l are small enough, we 
expect the dependence to be roughly linear: that is how 
nature seems to work. In other words, we expect there 
to be some linear map T such that (p,q,r) is roughly 
T(h,k,l) when h, k, and i are small. Notice that each 
of p, q, and r depends on each of h, k, and l, so nine 
numbers will be needed in order to specify this linear 
map. In faet, we can express it in matrix form: 

( p\ fo n au «i3\ fh\ 

d = «21 0-22 0,23 fe ■ 

r) \a 3 i 032 a 33 ) \l) 

The matrix entries fly express individual dependencies. 
For example, if x and z are held fixed, then we are setting 
h = l = 0, from which it follows that the rate of change 
u as just y varies is given by the entry au- That is, au 
is the partial derivative du/dy at the point (x,y,z). 

This tells us how to calculate the matrix, but from 
the conceptual point of view it is easier to use vector 
notation. Write x for (x,y,z), u(x) for (u,v,w), h for 
( h , k, l), and p for (p, q, r). Then what we are saying is 
that 

p = T(h) + e(h) 

for some vector e(h) that is small relative to h. Alterna- 
tively, we can write 

u(x + h) = u(x) + T(h) + e(h), 
a formula that is closely analogous to our earlier formula 
g(x + h) = g(x)+mh + e(h). This tells us that if we add 
a small vector h to x, then u(x) will change by roughly 
T(h). 

5.4 Partial Differential Equations 

Partial differential equations are of immense importance 
in physics, and have inspired a vast amount of mathe- 
matical research. Three basic examples will be discussed 
here, as an introduction to more advanced articles later 
in the volume (see, in particular, partial differential 
equations [IV.16]). 


The first is the heat equation, which, as its name sug- 
gests, describes the way the distribution of heat in a 
physical medium changes with time: 

ar _ /s 2 / a 2 / a 2 r \ 

dl \9x 2 + dy 2 + 3z 2 /' 

Here, T(x,y,z,t) is a funetion that specifies the tem- 
perature at the point (x,y,z) at time t. 

It is one thing to read an equation like this and under- 
stand the symbols that make it up, but quite another to 
see what it really means. However, it is important to do 
so, since of the many expressions one could write down 
that involve partial derivatives, only a minority are of 
mueh signihcance, and these tend to be the ones that 
have interesting interpretations. So let us try to interpret 
the expressions involved in the heat equation. 

The left-hand side, dT Idt, is quite simple. It is the rate 
of change of the temperature T(x,y,z,t) when the spa- 
tial coordinates x, y, and z are kept fixed and t varies. 
In other words, it tells us how fast the point (x,y, z ) is 
heating up or cooling down at time t. What would we 
expect this to depend on? Well, heat takes time to travel 
through a medium, so although the temperature at some 
distant point (x',y',z') will eventually affeet the tem- 
perature at (x,y,z), the way the temperature is chang- 
ing right now (that is, at time t) will be affeeted only 
by the temperatures of points very close to (x,y,z): if 
points in the immediate neighborhood of (x,y,z) are 
hotter, on average, than (x,y,z) itself, then we expect 
the temperature at (x, y, z) to be inereasing, and if they 
are colder then we expect it to be decreasing. 

The expression in brackets on the right-hand side 
appears so often that it has its own shorthand. The 
symbol A, defined by 


' 9x 2 ' 


3 2 / 

dy 2 


S 2 / 

dz z>: 


is known as the Laplacian. What information does Af 
give us about a funetion/? The answer is that it captures 
the idea in the last paragraph: it tells us how the value 
of / at (x,y,z) compares with the average value of / 
in a small neighborhood of (x, y, z), or, more precisely, 
with the limit of the average value in a neighborhood 
of (x,y,z) as the size of that neighborhood shrinks to 
zero. 

This is not immediately obvious from the formula, 
but the following (not wholly rigorous) argument in one 
dimension gives a clue about why second derivatives 
should be involved. Let / be a funetion that takes real 
numbers to real numbers. Then to obtain a good approx- 
imation to the second derivative of / at a point x, 
one can look at the expression (/'(x) - f'(x - h))/h 
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for some small h. (If one substitutes -h for h in the 
above expression, one obtains the more usual formula, 
but this one is more convenient here.) The derivatives 
/' (x) and f (x - h) can themselves be approximated by 
(f(x + h)~ f(x))/h and (/(x) - /(x - h))/h, respec- 
tively, and if we substitute these approximations into 
the earlier expression, then we obtain 

w f(x + h)-f(x) f(x)-f(x-h) \ 

h\ h hl' 

which equals (/(x + h) - 2 f(x) + /(x - h))/h 2 . 
Dividing the top of this last fraction by 2, we obtain 
\(f(x + h) + f(x - h)) - /(x): that is, the difference 
between the value of / at x and the average value of 
/ at the two surrounding points x + h and x-h. 

In other words, the second derivative conveys just the 
idea we want— a comparison between the value at x and 
the average value near x. It is worth noting that if / is 
linear, then the average of f(x - h ) and f(x+h) will be 
equal to f(x), which fits with the familiar faet that the 
second derivative of a linear funetion / is zero. 

Just as, when defimng the first derivative, we have to 
divide the difference f(x + h) - f(x) by h so that it is 
not automatically tiny, so with the second derivative it is 
appropriate to divide by h 2 . (This is appropriate, since, 
whereas the first derivative concerns linear approxima- 
tions, the second derivative concerns quadratic ones: 
the best quadratic approximation for a funetion / near 
a value x is /(x + h) = /(x) + hf(x) + \h 2 f"(x), 
an approximation that one can check is exact if / was a 
quadratic funetion to start with.) 

It is possible to pursue thoughts of this kind and show 
that if / is a funetion of three variables then the value of 
Af at (x, y, z) does indeed tell us how the value of / at 
(x, y , z) compares with the average values of / at points 
nearby. (There is nothing special about the number 3 
here— the ideas can easily be generalized to funetions 
of any number of variables.) All that is left to discuss 
in the heat equation is the parameter k. This measures 
the conductivity of the medium. If k is small, then the 
medium does not conduct heat very well and AT has less 
of an effeet on the rate of change of the temperature; if 
it is large then heat is conducted better and the effeet is 
greater. 

A second equation of great importance is the Laplace 
equation. Af = 0. Intuitively speaking, this says of a 
funetion / that its value at a point (x,y,z) is always 
equal to the average value at the immediately surround- 
ing points. If / is a funetion of just one variable x, 
this says that the second derivative of / is zero, which 
implies that / is of the form ax + b. However, for two or 
more variables, a funetion has more flexibility— it can lie 


above the tangent lines in some directions and below it 
in others. As a result, one can impose a variety of bound- 
ary conditions on / (that is, specifications of the values 
/ takes on the boundaries of certain regions), and there 
is a mueh wider and more interesting class of solutions. 

A third fundamental equation is the wave equation. In 
its one- dimensional formulation it describes the motion 
of a vibrating string that connects two points A and B. 
Suppose that the height of the string at distance x from 
A and at time t is written h(x,t). Then the wave equation 
says that 

1 d 2 h = cPh 
v 2 d t 2 9x 2 ' 

Ignoring the constant l/u 2 for a moment, the left-hand 
side of this equation represents the acceleration (in a 
vertical direction) of the piece of string at distance x 
from A. This should be proportional to the force act- 
ing on it. What will govern this force? Well, suppose for 
a moment that the portion of string containing x were 
absolutely straight. Then the pull of the string on the 
left of x would exaetly cancel out the pull on the right 
and the net force would be zero. So, once again, what 
matters is how the height at x compares with the aver- 
age height on either side: if the string lies above the 
tangent line at x, then there will be an upwards force, 
and if it lies below, then there will be a downwards one. 
This is why the second derivative appears on the right- 
hand side once again. How mueh force results from this 
second derivative depends on factors such as the den- 
sity and tautness of the string, which is where the con- 
stant comes in. Since h and x are both distances, v 2 
has dimensions of (distance/time) 2 , which means that 
v represents a speed, which is, in faet, the speed of 
propagation of the wave. 

Similar considerations yield the three-dimensional 
wave equation, which is, as one might now expect, 

1 d 2 h d 2 h d 2 h d 2 h 
v 2 3t 2 “ dx 2 + dy 2 + dz 2 ’ 


One can be more concise still and write this equation as 
□ 2 h = 0, where □ 2 h is shorthand for 


The operation U 2 is called the d’Alembertian, after 
d’alembert [VI.19], who was the first to formulate the 
wave equation. 
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5.5 Integration 

Suppose that a car drives down a long straight road for 
one minute, and that you are told where it starts and 
what its speed is during that minute. How can you work 
out how far it has gone? If it travels at the same speed 
for the whole minute then the problem is very simple 
indeed— for example, if that speed is thirty miles per 
hour then we can divide by sixty and see that it has gone 
half a mile— but the problem becomes more interesting 
if the speed varies. Then, instead of trying to give an 
exact answer, one can use the following technique to 
approximate it. First, write down the speed of the car 
at the beginning of each of the sixty seconds that it is 
traveling. Next, for each of those seconds, do a simple 
calculation to see how far the car would have gone dur- 
ing that second if the speed had remained exactly as 
it was at the beginning of the second. Finally, add up 
all these distances. Since one second is a short time, the 
speed will not change very much during any one second, 
so this procedure gives quite an accurate answer. More- 
over, if you are not satisfied with this accuracy, then you 
can improve it by using intervals that are shorter than a 
second. 

If you have done a first course in calculus, then you 
may well have solved such problems in a completely dif- 
ferent way. In a typical question, one is given an explicit 
formula for the speed at time t — something like at + u, 
for example— and in order to work out how far the car 
has gone one “integrates” this function to obtain the for- 
mula \ at 2 + ut for the distance traveled at time t. Here, 
integration simply means the opposite of differentiation: 
to find the integral of a function / is to find a function 
g such that g' (t) = f(t). This makes sense, because if 
g (t) is the distance traveled and / (t) is the speed, then 
f(t) is indeed the rate of change of g(t). 

However, antidifferentiation is not the definition of 
integration. To see why not, consider the following ques- 
tion: what is the distance traveled if the speed at time t 
is e ~ t2 . It is known that there is no nice function (which 
means, roughly speaking, a function built up out of 
standard ones such as polynomials, exponentials, log- 
arithms, and trigonometric fimetions) with e -t as its 
derivative, yet the question still makes good sense and 
has a definite answer. (It is possible that you have heard 
of a function <P(t) that differentiates to e _f2/2 , from 
which it follows that </>(ty / 2)/y / 2 differentiates to e _t . 
However, this does not remove the difficulty, since <P(t) 
is defined as the integral of e -t /2 .) 

In order to define integration in situations like this 
where antidifferentiation runs into difficulties, we must 


fall back on messy approximations of the kind discussed 
earlier. A formal definition along such lines was given by 
riemann [VI.48] in the mid nineteenth century. To see 
what Riemann' s basic idea is, and to see also that integra- 
tion, like differentiation, is a procedure that can usefully 
be applied to funetions of more than one variable, let us 
look at another physical problem. 

Suppose that you have a lump of impure rock and wish 
to calculate its mass from its density. Suppose also that 
this density is not constant but varies rather irregularly 
through the rock. Perhaps there are even holes inside, so 
that the density is zero in places. What should you do? 

Riemann’s approach would be this. First, you enclose 
the rock in a cuboid. For each point ( x,y,z ) in this 
cuboid there is then an associated density d(x,y,z) 
(which will be zero if (x,y,z) lies outside the rock or 
inside a hole). Second, you divide the cuboid into a large 
number of smaller cuboids. Third, in each of the small 
cuboids you look for the point of lowest density (if any 
point in the cuboid is not in the rock, then this density 
will be zero) and the point of highest density. Let C be 
one of the small cuboids and suppose that the lowest 
and highest densities in C are a and b, respectively, and 
that the volume of C is V. Then the mass of the part 
of the rock that lies in C must lie between aV and bV. 
Fourth, add up all the numbers aV that are obtained in 
this way, and then add up all the numbers bV. If the 
totals are Mi and M2 , respectively, then the total mass 
of rock has to lie between Mi and M2. Finally, repeat 
this calculation for subdivisions into smaller and smaller 
cuboids. As you do this, the resulting numbers Mi and 
M2 will become doser and doser to each other, and you 
will have better and better approximations to the mass 
of the rock. 

Similarly, his approach to the problem about the car 
would be to divide the minute up into small intervals and 
look at the minimum and maximum speeds during those 
intervals. This would enable him to say for each interval 
that the car had traveled a distance of at least a and at 
most b. Adding up these sets of numbers, he could then 
say that over the full minute the car must have traveled 
a distance of at least Di (the sum of the as) and at most 
D2 (the sum of the b s). 

For both these problems we had a function (den- 
sity/speed) defined on a set (the cuboid/a minute of 
time) and in a certain sense we wanted to work out the 
“total amount” of the function. We did so by dividing 
the set into small parts and doing simple calculations 
in those parts to obtain approximations to this amount 
from below and above. This process is what is known 
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as (Riemann) integration. The following notation is com- 
mon: if S is the set and / is the function, then the 
total amount of / in S, known as the integral, is written 
J s /(x) dx. Here, x denotes a typical element of S. If, 
as in the density example, the elements of S are points 
(x,y,z), then vector notation such as j s fix) dx can 
be used, though often it is not and the reader is left to 
deduce from the context that an ordinary “x” denotes a 
vector rather than a real number. 

We have been at pains to distinguish integration from 
antidifferentiation, but a famous theorem, known as the 
fundamental theorem of calculus, asserts that the two 
procedures do, in faet, give the same answer, at least 
when the function in question has certain continuity 
properties that all “sensible” funetions have. So it is usu- 
ally legitimate to regard integration as the opposite of 
differentiation. More precisely, if / is continuous and 
F(x) is defined to be J* /(t) dt for some a, then F can 
be differentiated and F'(x) = f(x). That is, if you inte- 
grate a continuous function and differentiate it again, 
you get back to where you started. Going the other way 
around, if F has a continuous derivative / and a < b, 
then {* f(t) dt = F(x) - F(a). This almost says that if 
you differentiate F and then integrate it again, you get 
back to F. Actually, you have to choose an arbitrary 
number a and what you get is the function F with the 
constant F ( a ) subtracted. 

To give an idea of the sort of exceptions that arise if 
one does not assume continuity, consider the so-called 
Heaviside step function H(x), which is 0 when x < 0 
and 1 when x ^ 0. This function has a jump at 0 and is 
therefore not continuous. The integral J(x) of this func- 
tion is 0 when x < 0 and x when x ^ 0, and for almost all 
values of x we have J'(x) = H(x). However, the gradi- 
ent of J suddenly changes at 0, so J is not differentiable 
there and one cannot say that /'(O) = H( 0) = 1. 

5.6 Holomorphic Funetions 

One of the jewels in the crown of mathematics is com- 
plex analysis, which is the study of differentiable fune- 
tions that take complex numbers to complex numbers. 
Funetions of this kind are called holomorphic. 

At first, there seems to be nothing special about such 
funetions, since the definition of a derivative in this con- 
text is no different from the definition for funetions of a 
real variable: if / is a function then the derivative /' (z) 
at a complex number z is defined to be the limit as h 
tends to zero of (/(z + h) - f(z))/h. However, if we 
look at this definition in a slightly different way (one 
which we saw in section 5.3), we find that it is not alto- 
gether easy for a complex function to be differentiable. 


Recall from that section that differentiation means lin- 
ear approximation. In the case of a complex function, 
this means that we would like to approximate it by fune- 
tions of the form g(w) = A w + g, where A and p are 
complex numbers. (The approximation near z will be 
g{w) = f{z) + f'(z)(w - z), which gives A = f'(z) 
and p = f(z) - zf'(z).) 

Let us regard this situation geometrically. If A 0 then 
the effeet of multiplying by A is to expand z by some fac- 
tor r and to rotate it by some angle 9. This means that 
many transformations of the plane that we would ordi- 
narily consider to be linear, such as reflections, shears, 
or stretches, are ruled out. We need two real numbers 
to specify A (whether we write it in the form a + b\ or 
re'°), but to specify a general linear transformation of 
the plane takes four (see the discussion of matrices in 
section 4.2). This reduction in the number of degrees of 
freedom is expressed by a pair of differential equations 
called the Cauchy-Riemann equations. Instead of writing 
/(z) let us write u(x + i y) +\v(x + i y), where x and y 
are the real and imaginary parts of z and w(x + iy ) and 
v (x + iy ) are the real and imaginary parts of fix + iy). 
Then the linear approximation to / near z has the matrix 

( du du\ 
dx dy I 
dv dv_\' 
dx dy) 

The matrix of an expansion and rotation always has the 
form ( * b k ), from which we deduce that 
du dv , du dv 
dx dy 311 dy dx' 

These are the Cauchy-Riemann equations. One conse- 
quence of these equations is that 

d 2 u + d 2 u_ d 2 v _ d 2 v _ Q 
dx 2 dy 2 dxdy dydx 
(It is not obvious that the necessary conditions hold for 
the symmetry of the mixed partial derivatives, but when 
/ is holomorphic they do.) Therefore, u satisfies the 
Laplace equation (which was discussed in section 5.4). 
A similar argument shows that v does as well. 

These facts begin to suggest that complex differentia- 
bility is a mueh stronger condition than real differen- 
tiability and that we should expect holomorphic fune- 
tions to have interesting properties. For the remainder 
of this subsection, let us look at a few of the remarkable 
properties that they do indeed have. 

The first is related to the fundamental theorem 
of calculus i disenssed in the previous subsection ). Sup- 
pose that F is a holomorphic function and we are given 
its derivative / and the value of F(u) for some complex 
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number u. How can we reconstruct F? An approximate 
method is as follows. Let w be another complex num- 
ber and let us try to work out F(w). We take a sequence 

of points zo,zi z n with zo = u and z n = z, and 

with the differences |zi - zol, |Z2 - z\ |, . . . , |z n - z n — 1 1 
all small. We can then approximate F(z,+i) - F(z;) by 
(Zi+i -Zi)f(Zi). It follows that F(w) - F(u), which 
equals F(z n ) - F(zo), is approximated by the sum of 
all the (Z(+ 1 - Zi)f(Zi). (Since we have added together 
many small errors, it is not obvious that this approxi- 
mation is a good one, but it turns out that it is.) We can 
imagine a number z that starts at u and follows a path P 
to w by jumping from one Zi to another in small steps of 
5z = Zj+i - Zi . In the limit as n goes to infinity and the 
steps 5z go to zero we obtain a so-called path integral, 
which is denoted f P f(z) dz. 

The above argument has the consequence that if the 
path P begins and ends at the same point u, then 
the path integral j P f(z) dz is zero. Equivalently, if two 
paths Pi and P2 have the same starting point u and the 
same endpoint w, then the path integrals j Pl f(z) dz and 
Jp 2 f(z) dz are the same, since they both give the value 
F(w) - F(u). 

Of course, in order to establish this, we made the big 
assumption that / was the derivative of a function F. 
Cauchy’s theorem says that the same conclusion is true 
if / is holomorphic. That is, rather than requiring / to 
be the derivative of another function, it asks for / itself 
to have a derivative. If that is the case, then any path 
integral of / depends only on where the path begins and 
ends. What is more, these path integrals can be used to 
define a function F that differentiates to /, so a function 
with a derivative automatically has an antiderivative. 

It is not necessary for the function / to be defined on 
the whole of C for Cauchy’s theorem to be valid: every- 
thing remains true if we restrict attention to a simply 
connected domain, which means an open set with no 
holes in it. If there are holes, then two path integrals 
may differ if the paths go around the holes in different 
ways. Thus, path integrals have a close connection with 
the topology of subsets of the plane, an observation that 
has many ramifications throughout modern geometry. 
For more on topology, see section 6.4 of this article and 
ALGEBRAIC TOPOLOGY [IV. 10]. 

A very surprising faet, which can be deduced from 
Cauchy’s theorem, is that if / is holomorphic then it 
can be differentiated twice. (This is completely untrue 
of real-valued funetions: consider, for example, the func- 
tion / where f(x) = 0 when x <0 and f(x) = x 2 when 
x Js 0.) It follows that /' is holomorphic, so it too can 
be differentiated twice. Continuing, one finds that / can 


be differentiated any number of times. Thus, for com- 
plex funetions differentiabihty implies infmite differen- 
tiability. (This property is what is used to establish the 
symmetry, and even the existence, of the mixed partial 
derivatives mentioned earlier.) 

A closely related faet is that wherever a holomorphic 
function is defined it can be expanded in a power series. 
That is, if / is defined and differentiable everywhere on 
an open disk of radius R about w, then it will be given 
by a formula of the form 

f(z) = X a n (z-w) n 

valid everywhere in that disk. This is called the Taylor 
expansion of /. 

Another fundamental property of holomorphic fune- 
tions, one that shows just how “rigid” they are, is that 
their entire behavior is determined just by what they do 
in a small region. That is, if / and g are holomorphic and 
they take the same values in some tiny disk, then they 
must take the same values everywhere. This remarkable 
faet allows a process of analytic continuation. If it is diffr- 
cult to define a holomorphic function / everywhere you 
want it defined, then you can simply define it in some 
small region and say that elsewhere it takes the only 
possible values that are consistent with the ones that 
you have just specified. This is how the famous riemann 
zeta function [IV.4 §3] is conventionally defined. 

6 What Is Geometry? 

It is not easy to do justice to geometry in this article 
because the fundamental concepts of the subject are 
either too simple to need explaining— for example, there 
is no need to say here what a circle, line, or plane is— 
or sufficiently advanced that they are better discussed in 
parts III and IV of the book. However, if you have not met 
the advanced concepts and have no idea what modern 
geometry is like, then you will get mueh more out of this 
book if you understand two basic ideas: the relationship 
between geometry and symmetry, and the notion of a 
manifold. These ideas will occupy us for the rest of the 
article. 

6.1 Geometry and Symmetry Groups 

Broadly speaking, geometry is the part of mathemat- 
ics that involves the sort of language that one would 
conventionally regard as geometrical, with words such 
as “point," “line," “plane,” “space,” “curve,” “sphere,” 
“cube," “distance,” and “angle” playing a prominent 
role. However, there is a more sophisticated view, first 
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advocated by klein [VI. 5 6], which regards transforma- 
tions as the true subject matter of geometry. So, to the 
above list one should add words like “reflection,” “rota- 
tion,” “translation,” “stretch,” “shear,” and “projection,” 
together with slightly more nebulous concepts such as 
“angle-preserving map” or “continuous deformation.” 

As was discussed in section 2.1, transformations go 
hånd in hånd with groups, and for this reason there 
is an intimate connection between geometry and group 
theory. Indeed, given any group of transformations, 
there is a corresponding notion of geometry, in which 
one studies the phenomena that are unaffectedby trans- 
formations in that group. In particular, two shapes are 
regarded as equivalent if one can be turned into the 
other by means of one of the transformations in the 
group. Different groups will of course lead to differ- 
ent notions of equivalence, and for this reason mathe- 
maticians frequently talk about geometries, rather than 
about a single monolithic subject called geometry. This 
subsection contains brief descriptions of some of the 
most important geometries and their associated groups 
of transformations. 

6.2 Euclidean Geometry 

Euclidean geometry is what most people would think 
of as “ordinary” geometry, and, not surprisingly given 
its name, it includes the basic theorems of Greek geom- 
etry that were the staple of geometers for thousands of 
years. For example, the theorem that the three angles of 
a triangle add up to 180° belongs to Euclidean geometry. 

To understand Euclidean geometry from a transfor- 
mational viewpoint, we need to say how many dimen- 
sions we are working in, and we must of course specify 
a group of transformations. The appropriate group is the 
group of rigid transformations. These canbe thought of 
in two different ways. One is that they are the transfor- 
mations of the plane, or of space, or more generally of 
R” for some n, that preserve distance. That is, T is a rigid 
transformation if, given any two points x and y, the dis- 
tance between Tx and Ty is always the same as the dis- 
tance between x and y. (In dimensions greater than 3, 
distance is defined in a way that naturally generalizes 
the Pythagorean formula. See metric spaces [III. 5 8] for 
more details.) 

It turns out that every such transformation can be 
realized as a combination of rotations, reflections, and 
translations, and this gives us a more concrete way to 
think about the group. Euclidean geometry, in other 
words, is the study of concepts that do not change when 
you rotate, reflect, or translate, and these include points, 


lines, planes, circles, spheres, distance, angle, length, 
area, and volume. The rotations of R n form an important 
group, the special orthogonal group, known as SO(n). 
The larger orthogonal group O (n) includes reflections 
as well. (It is not quite obvious how to define a “rota- 
tion” of n-dimensional space, but it is not too hard to 
do. An orthogonal map of R n is a linear map T that pre- 
serves distances, in the sense that d(Tx, Ty) is always 
the same as d(x, y ). It is a rotation if its determinant 
[HI.15] is 1. The only other possibility for the determi- 
nant of a distance-preserving map is -1. Such maps are 
like reflections in that they turn space “inside out.”) 

6.3 Affine Geometry 

There are many linear maps besides rotations and reflec- 
tions. What happens if we enlarge our group from SO(n) 
or O(n) to include as many of them as possible? For a 
transformation to be part of a group it must be invertible 
and not all linear maps are, so the natural group to look 
at is the group GL n (R) of all invertible linear transfor- 
mations of R n , a group that we first met in section 4.2. 
These maps all leave the origin Hxed, but if we want 
we can incorporate translations and consider a larger 
group that consists of all transformations of the form 
x >- Tx + b, where b is a fixed vector and T is an invert- 
ible linear map. The resulting geometry is called affine 
geometry. 

Since linear maps include stretches and shears, they 
preserve neither distance nor angle, so these are not 
concepts of affine geometry. However, points, lines, and 
planes remain as points, lines, and planes after an invert- 
ible linear map and a translation, so these concepts do 
belong to affine geometry. Another affine concept is that 
of two lines being parallel. (That is, although angles in 
general are not preserved by linear maps, angles of zero 
are.) This means that although there is no such thing as 
a square or a rectangle in affine geometry, one can still 
talk about a parallelogram. Similarly, one cannot talk of 
circles but one can talk of ellipses, since a linear map 
transformation of an ellipse is another ellipse (provided 
that one regards a circle as a special kind of ellipse). 

6.4 Topology 

The idea that the geometry associated with a group 
of transformations “studies the concepts that are pre- 
served by all the transformations” can be made more 
precise using the notion of equivalence relations 
[1.2 §2.3]. Indeed, let G be a group of transformations of 
R n . We might think of a d-dimensional “shape" as being 
a subset S of R M , but if we are doing G-geometry, then 
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Figure 1 A sphere morphing into a cube. 

we do not want to distinguish between a set S and any 
other set we can obtain from it using a transformation in 
G. So in that case we say that the two shapes are equiva- 
lent. For example, two shapes are equivalent in Euclidean 
geometry if and only if they are congruent in the usual 
sense, whereas in two-dimensional affine geometry all 
parallelograms are equivalent, as are all ellipses. One can 
think of the basic objects of G-geometry as equivalence 
classes of shapes rather than the shapes themselves. 

Topology can be thought of as the geometry that 
arises when we use a particularly generous notion of 
equivalence, saying that two shapes are equivalent, or 
homeomorphic, to use the technical term, if each can be 
“continuously deformed'’ into the other. For example, a 
sphere and a cube are equivalent in this sense, as figure 1 
illustrates. 

Because there are very many continuous deforma- 
tions, it is quite hard to prove that two shapes are not 
equivalent in this sense. For example, it may seem obvi- 
ous that a sphere (this means the surface of a hall rather 
than the solid hall) cannot be continuously deformed 
into a torus (the shape of the surface of a doughnut 
of the kind that has a hole in it), since they are fun- 
damentally different shapes— one has a “hole" and the 
other does not. However, it is not easy to turn this intu- 
ition into a rigorous argument. For more on this kind 
of problem, see invariants [1.4 §2.2] and differential 
topology [IV. 9]. 

6.5 Spherical Geometry 

We have been steadily relaxing our requirements for two 
shapes to be equivalent, by allowing more and more 
transformations. Now let us tighten up again and look 
at spherical geometry. Here the universe is no longer R n 
but the n-dimensional sphere S n , which is defined to be 
the surface of the (n + 1) -dimensional hall, or, to put it 
more algebraically, the set of all points (x\ , X2 , . . . , x n +i ) 
in R n+1 such that x\ + x\ + ■ ■ ■ + x£ +l = 1. Just as the 
surface of a three-dimensional hall is two dimensional, 
so this set is n dimensional. We shall discuss the case 
n = 2 here, but it is easy to generalize the discussion to 
larger n. 

The appropriate group of transformations is SO (3): 
the group of ah rotations about some axis that goes 


through the origin. (One could allow reflections as well 
and take 0(3).) These are symmetries of the sphere S2, 
and that is how we regard them in spherical geometry, 
rather than as transformations of the whole of IR 3 . 

Among the concepts that make sense in spherical 
geometry are line, distance, and angle. It may seem odd 
to talk about a line if one is confined to the surface of 
a hall, but a “spherical line” is not a line in the usual 
sense. Rather, it is a subset of S2 obtained by intersect- 
ing S2 with a plane through the origin. This produces a 
great circle, that is, a circle of radius 1, which is as large 
as it can be given that it lives inside a sphere of radius 1. 

The reason that a great circle deserves to be thought 
of as some sort of line is that the shortest path between 
any two points x and y in 52 will always be along a 
great circle, provided that the path is confined to Si- 
This is a very natural restriction to make, since we are 
regarding S2 as our “universe.” It is also a restriction 
of some practical relevance, since the shortest sensible 
route between two distant points on Earth’s surface will 
not be the straight-line route that burrows hundreds of 
miles underground. 

The distance between two points x and y is defined to 
be the length of the shortest path from x to y that lies 
entirely in Si- (If x and y are opposite each other, then 
there are infinitely many shortest paths, all of length tt, 
so the distance between x and y is tt.) How about the 
angle between two spherical lines? Well, the lines are 
intersections of 52 with two planes, so one can define it 
to be the angle between these two planes in the Euclidean 
sense. A more aesthetically pleasing way to view this, 
because it does not involve ideas external to the sphere, 
is to notice that if you look at a very small region about 
one of the two points where two spherical lines cross, 
then that portion of the sphere will be almost flat, and 
the lines almost straight. So you can define the angle to 
be the usual angle between the “limiting" straight hnes 
inside the “limiting” plane. 

Spherical geometry differs from Euclidean geometry 
in several interesting ways. For example, the angles of 
a spherical triangle always add up to more than 180°. 
Indeed, if you take as the vertices the North Pole, a point 
on the equator, and a second point a quarter of the way 
around the equator from the first, then you obtain a tri- 
angle with three right angles. The smaller a triangle, the 
flatter it becomes, and so the doser the sum of its angles 
comes to 180°. There is a beautiful theorem that gives a 
precise expression to this: if we switch to radians, and 
if we have a spherical triangle with angles a, f>, and y, 
then its area is a + P + y - tt. (For example, this formula 
tells us that the triangle with three angles of \ tt has area 
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\n, which indeed it does as the surface area of a ball of 
radius 1 is 47 t and this triangle occupies one-eighth of 
the surface.) 

6.6 Hyperbolic Geometry 

So far, the idea of defining geometries with reference 
to sets of transformations may look like nothing more 
than a useful way to view the subject, a unified approach 
to what would otherwise be rather different-looking 
aspects. However, when it comes to hyperbolic geom- 
etry, the transformational approach becomes indispens- 
able, for reasons that will be explained in a moment. 

The group of transformations that produces hyper- 
bolic geometry is called PSL(2,R), the projective special 
linear group in two dimensions. One way to present this 
group is as follows. The special linear group SL(2,R) is 
the set of all matrices ( “ \ ) with determinant [III. 15] 
ad - bc equal to 1. (These form a group because the 
product of two matrices with determinant 1 again has 
determinant 1.) To make this “projective,” one then 
regards each matrix A as equivalent to - A: for example, 
the matrices ( _ 3 5 ~ 2 ) and ( " 5 3 } 2 ) are equivalent. 

To get from this group to the geometry one must first 
interpret it as a group of transformations of some two- 
dimensional set of points. Once we have done this, we 
have what is called a model of two-dimensional hyper- 
bolic geometry. The subtlety is that, unlike with spheri- 
cal geometry, where the sphere was the “obvious” model, 
there is no single model of hyperbolic geometry that is 
clearly the best. (In faet, there are alternative models of 
spherical geometry. For example, there is a natural way 
of associating with each rotation of IR 3 a transformation 
of R 2 with a “point at infinity” added, so the extended 
plane canbe used as a model of spherical geometry.) The 
three most commonly used models of hyperbolic geom- 
etry are called the half-plane model, the disk model, and 
the hyperboloid model. 

The half-plane model is the one most direetly asso- 
ciated with the group PSL(2,R). The set in question is 
the upper half-plane of the complex numbers C, that is, 
the set of all complex numbers z = x + yi such that 
y > 0. Given a matrix ( ® ^), the corresponding trans- 
formation is the one that takes the point z to the point 
(az + b) / (cz+d). (Notice that if we replace a, b, c, and d 
by their negatives, then we get the same transformation.) 
The condition ad - bc = 1 can be used to show that the 
transformed point will still lie in the upper half-plane, 
and also that the transformation can be inverted. 

What this does not yet do is tell us anything about 
distances, and it is here that we need the group to “gen- 


erate” the geometry. If we are to have a notion of dis- 
tance d that is sensible from the perspective of our 
group of transformations, then it is important that the 
transformations should preserve it. That is, if T is one 
of the transformations and z and w are two points in 
the upper half-plane, then d(T(z),T(w)) should always 
be the same as d(z, w). It turns out that there is essen- 
tially only one definition of distance that has this prop- 
erty, and that is the sense in which the group defines the 
geometry. (One could of course multiply all distances by 
some constant factor such as 3, but this would be like 
measuring distances in feet instead of yards, rather than 
a genuine difference in the geometry.) 

This distance has some properties that at first seem 
odd. For example, a typical hyperbolic line takes the form 
of a semicircular arc with endpoints on the real axis. 
However, it is semicircular only from the point of view of 
the Euclidean geometry of C: from a hyperbolic perspec- 
tive it would be just as odd to regard a Euclidean straight 
line as straight. The reason for the discrepancy is that 
hyperbolic distances become larger and larger, relative 
to Euclidean ones, the doser you get to the real axis. To 
get from a point z to another point w, it is therefore 
shorter to take a “detour” away from the real axis, and 
the best detour turns out to be along an arc of the circle 
that goes through z and w and cuts the real axis at right 
angles. (If z and w are on the same vertical line, then one 
obtains a “degenerate circle,” namely that vertical line.) 
These facts are no more paradoxical than the faet that 
a flat map of the world involves distortions of spher- 
ical geometry, making Greenland very large, for exam- 
ple. The half-plane model is like a “map” of a geometric 
structure, the hyperbolic plane, that in reality has a very 
different shape. 

One of the most famous properties of two-dimen- 
sional hyperbolic geometry is that it provides a geometry 
in which Euclid’s parallel postulate fails to hold. That is, 
it is possible to have a hyperbolic line I, a point x not 
on the line, and two different hyperbolic lines through 
x, neither of which meets L. All the other axioms of 
Euclidean geometry are, when suitably interpreted, true 
of hyperbolic geometry as well. It follows that the paral- 
lel postulate cannot be deduced from those axioms. This 
discovery, associated with gauss [VI.25], bolyai [VI. 33], 
and lob ache v skii [VI. 30], solved a problem that had 
bothered mathematicians for over two thousand years. 

Another property complements the result about the 
sum of the angles of spherical and Euclidean triangles. 
There is a natural notion of hyperbolic area, and the area 
of a hyperbolic triangle with angles a, j 3, and y is tt - ot - 
f - y. Thus, in the hyperbolic plane a + /? + y is always 
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Figure 2 A tessellatlon of the hyperbollc disk. 

less than tt, and it almost equals tt when the triangle 
is very small. These propertles of angle sums reflect the 
faet that the sphere has positive curvature [m.13], the 
Euchdean plane is “flat,” and the hyperbolic plane has 
negative curvature. 

The disk model, conceived in a famous moment of 
inspiration by poincaré [VI.60] as he was getting into 
a bus, takes as its set of points the open unit disk in C, 
that is, the set D of all complex numbers with modu- 
lus less than 1. This time, a typical transformation takes 
the following form. One takes a real number 6 and a 
complex number a from inside D, and sends each z 
in D to the point e w (z - a)/(l - az). It is not com- 
pletely obvious that these transformations form a group, 
and still less that the group is isomorphic to PSL(2,R). 
However, it turns out that the funetion that takes z to 
-(iz + l)/(z + i) maps the unit disk to the upper half- 
plane and vice versa. This shows that the two models 
give the same geometry and can be used to transfer 
results from one to the other. 

As with the half-plane model, distances become larger, 
relative to Euchdean distances, as you approach the 
boundary of the disk: from a hyperbolic perspective, the 
diameter of the disk is infinite and it does not really 
have a boundary. Figure 2 shows a tessellation of the 
disk by shapes that are congruent in the sense that any 
one can be turned into any other by means of a transfor- 
mation from the group. Thus, even though they do not 
look identical, within hyperbolic geometry they all have 
the same size and shape. Straight lines in the disk model 
are either ares of (Euclidean) circles that meet the unit 
circle at right angles, or segments of (Euchdean) straight 
lines that pass through the center of the disk. 

The hyperboloid model is the model that explains why 
the geometry is called hyperbolic. This time the set is 
the hyperboloid consisting of all points (x,y,z) g R 3 
such that z > 0 and x 2 +y 2 = 1 + z 2 . This is the hyper- 
boloid of revolution about the z-axis of the hyperbola 


x 2 = l+z 2 inthe plane y = 0. A general transformation 
in the group is a sort of “rotation” of the hyperboloid, 
and can be built up from genuine rotations about the z- 
axis, and “hyperbolic rotations" of the xz-plane, which 
have matrices of the form 

( coshd sinhéA 

sinhd coshd/ ' 

Just as an ordinary rotation preserves the unit circle, one 
of these hyperbolic rotations preserves the hyperbola 
x 2 = 1 + z 2 , moving points around inside it. Again, it 
is not quite obvious that this gives the same group of 
transformations, but it does, and the hyperboloid model 
is equivalent to the other two. 

6.7 Projective Geometry 

Projective geometry is regarded by many as an old-fash- 
ioned subject, and it is no longer taught in schools, but 
it still has an important role to play in modern mathe- 
matics. We shall concentrate here on the real projective 
plane, but projective geometry is possible in any number 
of dimensions and with scalars in any held. This makes 
it particularly useful to algebraic geometers. 

Here are two ways of regarding the projective plane. 
The first is that the set of points is the ordinary plane, 
together with a “point at infmity.” The group of trans- 
formations consists of funetions known as projections. 
To understand what a projection is, imagine two planes 
P and P' in space, and a point x that is not in either of 
them. We can “project” P onto P' as follows. If a is a 
point in P, then its image 4>(a) is the point where the 
line joining x to a meets P'. (If this line is parallel to 
P', then 4>(a) is the point at infmity of P'.) Thus, if you 
are at x and a picture is drawn on the plane P, then its 
image under the projection cf> will be the picture drawn 
on P' that to you looks exaetly the same. In faet, how- 
ever, it will have been distorted, so the transformation 
4> has made a difference to the shape. To turn <fi into 
a transformation of P itself, one can follow it by a rigid 
transformation that moves P' back to where P is. 

Such projections do not preserve distances, but 
among the interesting concepts that they do preserve are 
points, hnes, quantities known as cross-ratios, and, most 
famously, conic sections. A conic section is the intersec- 
tion of a plane with a cone, and it can be a circle, an 
ellipse, a parabola, or a hyperbola. From the point of 
view of projective geometry, these are all the same kind 
of object (just as, in afflne geometry, one can talk about 
ellipses but there is no special elhpse called a circle). 

A second view of the projective plane is that it is the 
set of all lines in R 3 that go through the origin. Since a 


1.3. Some Fundamental Mathematical Definitions 


41 


line is determined by the two points where it intersects 
the unit sphere, one can regard this set as a sphere, but 
with the significant difference that opposite points are 
regarded as the same— because they correspond to the 
same line. (This is quite hard to imagine, but not impos- 
sible. Suppose that, whatever happened on one side of 
the world, an identical copy of that event happened at 
the exactly corresponding place on the opposite side. If 
one was used to this situation and traveled from Paris, 
say, to the copy of Paris on the other side of the world, 
would one actually think that it was a different place? 
It would look the same and appear to have all the same 
people, and just as you arrived an identical copy of you, 
whom you could never meet, would be arriving in the 
“real” Paris. It might under such circumstances be more 
natural to say that there was only one Paris and only one 
you and that the world was not a sphere but a projective 
plane.) 

Under this view, a typical transformation of the pro- 
jective plane is obtained as follows. Take any invertible 
linear map, and apply it to K 3 . This takes lines through 
the origin to lines through the origin, and can there- 
fore be thought of as a function from the projective 
plane to itself. If one invertible linear map is a multi- 
ple of another, then they will have the same effect on 
all lines, so the resulting group of transformations is 
like GL 3 (R), except that all nonzero multiples of any 
given matrix are regarded as equivalent. This group 
is called the projective special linear group PSL(3,R), 
and it is the three-dimensional equivalent of PSL(2,R), 
which we have already met. Since PSL(3,R) is bigger 
than PSL(2,M), the projective plane comes with a richer 
set of transformations than the hyperbolic plane, which 
is why fewer geometrical properties are preserved. (For 
example, as we have seen, there is a useful notion of 
hyperbolic distance, but no obvious notion of projective 
distance.) 

6.8 Lorentz Geometry 

This is a geometry used in the theory of special relativity 
to model four-dimensional spacetime, otherwise known 
as Minkowski space. The main difference between it and 
four-dimensional Euclidean geometry is that, instead 
of the usual notion of distance between two points 
(t,x,y,z) and (t' ,x' ,y' ,z'), one considers the quantity 
-(t - t ') 2 + (x - x') 2 + (y - y') 2 + (z - z') 2 , 
which would be the square of the Euclidean distance 
were it not for the all-important minus sign before 
(t - t') 2 . This reflects the faet that space and time are 
significantly different (though intertwined). 


A Lorentz transformation is a linear map from R 4 to IR 4 
that preserves these “generalized distances.” Letting g 
be the linear map that sends ( t,x,y,z ) to (~t,x,y,z) 
and letting G be the corresponding matrix (which has 
- 1 , 1 , 1, 1 down the diagonal and 0 everywhere else), 
we can define a Lorentz transformation abstraetly as 
one whose matrix A satisfies AGA T = I, where I is 
the 4x4 identity matrix and A T is the transpose of A. 
(The transpose of a matrix A is the matrix B defined by 

Bij = Aji.) 

A point (f, x, y , z) is said to be spacelike if -t 2 + x 2 + 
y 2 + z 2 > 0, and timelike if -t 2 + x 2 + y 2 + z 2 < 0. If 
-t 2 + x 2 + y 2 + z 2 = 0 , then the point lies in the light 
cone. All these are genuine concepts of Lorentz geometry 
because they are preserved by Lorentz transformations. 

Lorentzian geometry is also of fundamental impor- 
tance to general relativity, which can be thought of as 
the study of Lorentzian manifolds. These are closely 
related to Riemannian manifolds, which are discussed 
in section 6.10. For a discussion of general relativity, 
see GENERAL RELATIVITY AND THE EINSTEIN EQUATIONS 
[IV. 17], 

6.9 Manifolds and Differential Geometry 

To somebody who has not been taught otherwise, it is 
natural to think that Earth is flat, or rather that it con- 
sists of a flat surface on top of which there are buddings, 
mountains, and so on. However, we now know that it is in 
faet more Uke a sphere, appearing to be flat only because 
it is so large. There are various kinds of evidence for this. 
One is that if you stand on a cliff by the sea then you can 
see a deflnite horizon, not too far away, over which ships 
disappear. This would be hard to explain if Earth were 
genuinely flat. Another is that if you travel far enough 
in what feels like a straight line then you eventually get 
back to where you started. A third is that if you travel 
along a triangular route and the triangle is a large one, 
then you will be able to detect that its three angles add 
up to more than 180°. 

It is also very natural to believe that the geometry that 
best models that of the universe is three-dimensional 
Euclidean geometry, or what one might think of as “nor- 
mal" geometry. However, this could be just as mueh of 
a mistake as believing that two-dimensional Euclidean 
geometry is the best model for Earth’s surface. 

Indeed, one can immediately improve on it by consid- 
ering Lorentz geometry as a model of spacetime, but 
even if there were no theory of special relativity, our 
astronomical observations would give us no particular 
reason to suppose that Euclidean geometry was the best 
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model for the universe. Why should we be so sure that 
we would not obtain a better model by taking the three- 
dimensional surface of a very large four-dimensional 
sphere? This might feel like “normal” space in just the 
way that the surface of Earth feels like a “normal” plane 
unless you travel large distances. Perhaps if you trav- 
eled far enough in a rocket without changing your course 
then you would end up where you started. 

It is easy to describe “normal” space mathematically: 
one just associates with each point in space a triple of 
coordinates (x,y,z) in the usual way. How might we 
describe a huge “spherical” space? It is slightly harder, 
but not much: one can give each point four coordin- 
ates (x,y,z,w) but add the condition that these must 
satisfy the equation x 2 + y 2 + z 2 + w 2 = R 2 for some 
fixed R that we think of as the “radius” of the universe. 
This describes the three-dimensional surface of a four- 
dimensional sphere of radius R in just the same way 
that the equation x 2 + y 2 + z 2 = R 2 describes the two- 
dimensional surface of a three-dimensional sphere of 
radius R. 

A possible objection to this approach is that it seems 
to rely on the rather implausible idea that the universe 
lives in some larger unobserved four-dimensional space. 
However, this objection canbe answered. The object we 
have just defined, the 3 -sphere S3, can also be described 
in what is known as an intrinsic way: that is, without 
reference to some surrounding space. The easiest way 
to see this is to discuss the 2-sphere first, in order to 
draw an analogy. 

Let us therefore imagine a planet covered with calm 
water. If you drop a large rock into the water at the 
North Pole, a wave will propagate out in a circle of ever- 
increasing radius. (At any one moment, it will be a circle 
of constant latitude.) In due course, however, this circle 
will reach the equator, after which it will start to shrink, 
until eventually the whole wave reaches the South Pole 
at once, in a sudden burst of energy. 

Now imagine setting off a three-dimensional wave in 
space— it could, for example, be a light wave caused 
by the switching on of a bright light. The front of this 
wave would now be not a circle but an ever-expanding 
spherical surface. It is logically possible that this surface 
could expand until it became very large and then con- 
tract again, not by shrinking back to where it started, 
but by turning itself inside out, so to speak, and shrink- 
ing to another point on the opposite side of the uni- 
verse. (Notice that in the two-dimensional example, what 
you want to call the inside of the circle changes when 
the circle passes the equator.) With a bit of effort, 
one can visualize this possibility, and there is no need 


to appeal to the existence of a fourth dimension in 
order to do so. More to the point, this account can be 
turned into a mathematically coherent and genuinely 
three-dimensional description of the 3-sphere. 

A different and more general approach is to use what 
is called an atlas. An atlas of the world (in the nor- 
mal, everyday sense) consists of a number of flat pages, 
together with an indication of their overlaps-, that is, of 
how parts of some pages correspond to parts of others. 
Now, although such an atlas is mapping out an exter- 
nal object that lives in a three-dimensional universe, the 
spherical geometry of Earth’s surface can be read off 
from the atlas alone. It may be much less convenient to 
do this but it is possible: rotations, for example, might be 
described by saying that such-and-such a part of page 1 7 
moved to a similar but slightly distorted part of page 24, 
and so on. 

Not only is this possible, but one can define a surface 
by means of two-dimensional atlases. For example, there 
is a mathematically neat “atlas” of the 2-sphere that con- 
sists of just two pages, both of them circular. One is 
a map of the Northern Hemisphere plus a little bit of 
the Southern Hemisphere near the equator (to provide 
a small overlap) and the other is a map of the South- 
ern Hemisphere with a bit of the Northern Hemisphere. 
Because these maps are flat, they necessarily involve 
some distortion, but one can specify what this distortion 

The idea of an atlas can easily be generalized to three 
dimensions. A “page” now becomes a portion of three- 
dimensional space. The technical term is not “page” but 
“chart,” and a three-dimensional atlas is a collection of 
charts, again with specifications of which parts of one 
chart correspond to which parts of another. A possible 
atlas of the 3-sphere, generalizing the simple atlas of 
the 2-sphere just discussed, consists of two solid three- 
dimensional balls. There is a correspondence between 
points toward the edge of one of these balls and points 
toward the edge of the other, and this can be used to 
describe the geometry: as you travel toward the edge of 
one ball you find yourself in the overlapping region, so 
you are also in the other ball. As you go further, you are 
off the map as far as the first ball is concerned, but the 
second ball has by that stage taken over. 

The 2-sphere and the 3-sphere are basic examples of 
manifolds. Other examples that we have already met in 
this section are the torus and the projective plane. Infor- 
mally, a d-dimensional manifold, or d-manifold, is any 
geometrical object M with the property that every point 
x in M is surrounded by what feels like a portion of d- 
dimensional Euclidean space. So, because small parts of 
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a sphere, torus, or projective plane are very close to pla- 
nar, they are all 2-manifolds, though when the dimen- 
sion is two the word surface is more usual. (However, 
it is important to remember that a “surface” need not 
be the surface of anything.) Similarly, the 3-sphere is a 
3-manifold. 

The formal definition of a manifold uses the idea of 
atlases: indeed, one says that the atlas is a manifold. This 
is a typical mathematician's use of the word “is,” and it 
should not be confused with the normal use. In practice, 
it is unusual to think of a manifold as a collection of 
charts with rules for how parts of them correspond, but 
the definition in terms of charts and atlases turns out 
to be the most convenient when one wishes to reason 
about manifolds in general rather than discussing spe- 
cific examples. For the purposes of this book, it may be 
better to think of a d -manifold in the “extrinsic” way that 
we first thought about the 3-sphere: as a d-dimensional 
“hypersurface” livmg in some higher-dimensional space. 
Indeed, there is a famous theorem of Nash that States 
that all manifolds arise in this way. Note, however, that 
it is not always easy to find a simple formula for defming 
such a hypersurface. For example, while the 2-sphere is 
described by the simple formula x 2 +y 2 +z 2 = 1 and the 
torus by the slightly more complicated and more artifi- 
cial formula (r - 2) 2 + z 2 = 1, where r is shorthand 
for - , éø : ♦ it is not easy to come up with a formula 
that describes a two-holed torus. Even the usual torus 
is far more easily described using quotients, as we did 
in section 3.3. Quotients can also be used to define a 
two-holed torus (see fuchsian groups [III.28]), and the 
reason one is confident that the result is a manifold is 
that every point has a small neighborhood that looks 
like a small part of the Euclidean plane. In general, a 
d-dimensional manifold can be thought of as any con- 
struction that gives rise to an object that is “locally like 
Euclidean space of d dimensions.” 

An extremely important feature of manifolds is that 
calculus is possible for functions defined on them. 
Roughly speaking, if M is a manifold and / is a function 
from M to R, then to see whether / is differentiable at a 
point x in M you first find a chart that contains x (or a 
representation of it), and regard / as a function defined 
on the chart instead. Since the chart is a portion of the 
d-dimensional Euclidean space K d and we can differen- 
tiate functions defined on such sets, the notion of dif- 
ferentiability now makes sense for /. Of course, for this 
definition to work for the manifold, it is important that 
if x belongs to two overlapping charts, then the answer 
will be the same for both. This is guaranteed if the func- 
tion that gives the correspondence between the overlap- 


ping parts (known as a transition function ) is itself differ- 
entiable. Manifolds with this property are called differ- 
entiable manifolds : manifolds for which the transition 
functions are continuous but not necessarily differen- 
tiable are called topological manifolds. The availability 
of calculus makes the theory of differentiable manifolds 
very different from that of topological manifolds. 

The above ideas generalize easily from real-valued 
functions to functions from M to R d , or from M toM', 
where M ’ is another manifold. However, it is easier to 
judge whether a function defined on a manifold is dif- 
ferentiable than it is to say what the derivative is. The 
derivative at some point x of a function from R M to R m 
is a linear map, and so is the derivative of a function 
defined on a manifold. However, the domain of the lin- 
ear map is not the manifold itself, which is not usually 
a vector space, but rather the so-called tangent space at 
the point x in question. 

For more details on this and on manifolds in general, 
see DIFFERENTIAL TOPOLOGY [IV.9]. 

6.10 Riemannian Metrics 

Suppose you are given two points P and Q on a sphere. 
How do you determine the distance between them? The 
answer depends on how the sphere is defined. If it is the 
set of all points ( x,y,z ) such that x 2 + y 2 + z 2 = 1 
then P and Q are points in R 3 . One can therefore use the 
Pythagorean theorem to calculate the distance between 
them. For example, the distance between the points 
(1,0,0) and (0,1,0) is -Jt. 

However, do we really want to measure the length of 
the line segment PQ? This segment does not lie in the 
sphere itself, so to use it as a means of defming length 
does not sit at all well with the idea of a manifold as 
an intrinsically defined object. Fortunately, as we saw 
earlier in the discussion of spherical geometry, there is 
another natural definition that avoids this problem: we 
can define the distance between P and Q as the length 
of the shortest path from P to Q that lies entirely within 
the sphere. 

Now let us suppose that we wish to talk more gener- 
ally about distances between points in manifolds. If the 
manifold is presented to us as a hypersurface in some 
bigger space, then we can use lengths of shortest paths 
as we did in the sphere. But suppose that the manifold is 
presented differently and all we have is a way of demon- 
strating that every point is contained in a chart— that is, 
has a neighborhood that can be associated with a por- 
tion of d-dimensional Euclidean space. (For the purposes 
of this discussion, nothing is lost if one takes d to be 
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2 throughout, in which case there is a correspondence 
between the neighborhood and a portion of the plane.) 
One idea is to define the distance between the two points 
to be the distance between the corresponding points in 
the chart, but this raises at least three problems. 

The first is that the points P and Q that we are looking 
at might belong to different charts. This, however, is not 
too much of a problem, since all we actually need to do is 
calculate lengths of paths, and that canbe done provided 
we have a way of defming distances between points that 
are very close toge ther, in which case we can find a single 
chart that contains them both. 

The second problem, which is much more serious, is 
that for any one manifold there are many ways of choos- 
ing the charts, so this idea does not lead to a single 
notion of distance for the manifold. Worse still, even if 
one fixes one set of charts, these charts will overlap, and 
it may not be possible to make the notions of distance 
compatible where the overlap occurs. 

The third problem is related to the second. The surf ace 
of a sphere is curved, whereas the charts of any atlas (in 
either the everyday or the mathematical sense) are flat. 
Therefore, the distances in the charts cannot correspond 
exactly to the lengths of shortest paths in the sphere 
itself. 

The single most important moral to draw from the 
above problems is that if we wish to define a notion of 
distance for a given manifold, we have a great deal of 
choice about how to do so. Very roughly, a Riemannian 
metric is a way of making such a choice. 

A little less roughly, a metric means a sensible 
notion of distance (the precise definition can be found 
in [in.58]). A Riemannian metric is a way of determin- 
ing infinitesimal distances. These infinitesimal distances 
can be used to calculate lengths of paths, and then the 
distance between two points canbe defined as the length 
of the shortest path between them. To see how this is 
done, let us first think about lengths of paths in the ordi- 
nary Euclidean plane. Suppose that (x,y) belongs to a 
path and (x + Sx, y + Sy) is another point on the path, 
very close to (x,y). Then the distance between the two 
points is v'åx 2 *. Sy 2 . To calculate the length of a suffi- 
ciently smooth path, one can choose a large number of 
points along the path, each one very close to the next, 
and add up their distances. This gives a good approxi- 
mation, and one can make it better and better by taking 
more and more points. 

In practice, it is easier to work out the length using cal- 
culus. A path itself can be thought of as a moving point 
(x (t),y(t)) that starts when t = 0 and ends when t = 1. 
If S t is very small, then x ( t + S t ) is approximately x ( t ) + 


x'(t)5t and y(t + 5t) is approximately y(t) + y'(t)5t. 
Therefore, the distance between (x(t),y(t)) and (x(t + 
5t),y(t + 5t)) is approximately 5tVx'(t) 2 + y'(t) 2 , by 
the Pythagorean theorem. Therefore, letting St go to 
zero and integrating all the infinitesimal distances along 
the path, we obtain the formula 


fjx'(t) 2 


+ y'(t) 2 dt 


for the length of the path. Notice that if we write 
x'(t) and y'(t) as dx/dt and dy/dt, then we can 
rewrite Vx'(t) 2 + y'(t) 2 dt as Vdx 2 + dy 2 , which is 
the infinitesimal version of our earlier expression 
Vdx 2 + Sy 2 . We have just defined a Riemannian met- 
ric, which is usually denoted by dx 2 + dy 2 . This can be 
thought of as the square of the distance between (x,y) 
and the infimtesimally close point (x + dx,y + dy). 

If we want to, we can now prove that the shortest path 
between two points (xo,yo) and (xi.yi) is a straight 
line, which will tell us that the distance between them 
is V(xi - xo) 2 + (yi -yo) 2 - (A proof can be found in 
variational methods [III.94].) However, since we could 
have just used this formula to begin with, this exam- 
ple does not really illustrate what is distinctive about 
Riemannian metrics. To do that, let us give a more pre- 
cise definition of the disk model for hyperbolic geom- 
etry, which was discussed in section 6.6. There it was 
stated that distances become larger, relative to Euclid- 
ean distances, as one approaches the edge of the disk. 
A more precise definition is that the open unit disk is 
the set of all points (x,y) such that x 2 + y 2 < 1 and 
that the Riemannian metric on this disk is given by the 
expression (dx 2 + dy 2 )/(l - x 2 - y 2 ). This is how we 
define the square of the distance between (x,y) and 
(x + dx,y + dy). Equivalently, the length of a path 
(x(t),y(t)) with respect to this Riemannian metric is 
defined as 


r 1 I x'(t) 2 + y'(tj 2 ~ 
Jo V 1 ~ x (t) 2 — y (t) 2 


dt. 


More generally, a Riemannian metric on a portion of 
the plane is an expression of the form 


E(x,y) dx 2 + 2 F(x,y) dxdy + G(x,y) dy 2 


that is used to calculate infinitesimal distances and 
hence lengths of paths. (In the disk model we took 
E(x,y) and G(x,y) to be 1/(1 - x 2 - y 2 ) and F(x,y) 
to be 0.) It is important for these distances to be 
positive, which will turn out to be the case provided 
that E(x,y)G(x,y) - F(x,y) 2 is always positive. One 
also needs the functions E, F, and G to satisfy certain 
smoothness conditions. 
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This definition generalizes straightforwardly to 
more dimensions. In n dimensions we must spec- 

ify the squared distance between (x\ x n ) and 

(x\ + dxi, ..., x n + dx n ), using an expression of the 
form 

f{j(Xi , . . . ,x n ) dxi dxj . 

The numbers F,;(xi , . . . ,x n ) form an n x n matrix 
that depends on the point (xi,...,x n ). This matrix 
is required to be symmetric and positive defmite, 
which means that Fy(x i,...,x n ) should always equal 
Fji(x \ , ...,x n ) and the expression that determines the 
squared distance should always be positive. It should 
also depend smoothly on the point (x \ , . . . ,x„). 

Finally, now that we know how to define many differ- 
ent Riemannian metrics on portions of Euclidean space, 
we have many potential ways to define metrics on the 
charts that we use to define a manifold. A Riemannian 
metric on a manifold is a way of choosing compatible 
Riemannian metrics on the charts, where “compatible” 
means that wherever two charts overlap the distances 
should be the same. As mentioned earlier, once one 
has done this, one can define the distance between two 
points to be the length of a shortest path between them. 

Given a Riemannian metric on a manifold, it is pos- 
sible to define many other concepts, such as angles 
and volumes. It is also possible to define the impor- 
tant concept of curvature, which is discussed in ricci 
flow [III.80]. Another important definition is that of a 
geodesic, which is the analogue for Riemannian geom- 
etry of a straight line in Euclidean geometry. A curve C 
is a geodesic if, given any two points P and Q on C that 
are sufficiently close, the shortest path from P to Q is 
part of C. For example, the geodesics on the sphere are 
the great circles. 

As should be clear by now from the above discussion, 
on any given manifold there is a multitude of possi- 
ble Riemannian metrics. A major theme in Riemannian 
geometry is to choose one that is “hest” in some way. 
For example, on the sphere, if we take the obvious defi- 
nition of the length of a path, then the resulting metric 
is particularly symmetric, and this is a highly desirable 
property. In particular, with this Riemannian metric the 
curvature of the sphere is the same everywhere. More 
generally, one searches for extra conditions to impose 
on Riemannian metrics. Ideally, these conditions should 
be strong enough that there is just one Riemannian met- 
ric that satisfies them, or at least that the family of such 
metrics should be very small. 


1.4 The General Goals of 
Mathematical Research 


The previous article introduced many concepts that 
appear throughout mathematics. This one discusses 
what mathematicians do with those concepts, and the 
sorts of questions they ask about them. 

1 Solving Equations 

As we have seen in earlier articles, mathematics is full 
of objects and structures (of a mathematical kind), but 
they do not simply sit there for our contemplation: we 
also like to do things to them. For example, given a num- 
ber, there will be contexts in which we want to double 
it, or square it, or work out its reciprocal; given a suit- 
able function, we may wish to differentiate it; given a 
geometrical shape, we may wish to transform it; and so 
on. 

Transformations like these give rise to a never-ending 
source of interesting problems. If we have defined some 
mathematical process, then a rather obvious mathe- 
matical project is to invent techniques for carrying it 
out. This leads to what one might call direct questions 
about the process. However, there is also a deeper set of 
inverse questions, which take the following form. Sup- 
pose you are told what process has been carried out and 
what answer it has produced. Can you then work out 
what the mathematical object was that the process was 
applied to? For example, suppose I tell you that I have 
just taken a number and squared it, and that the result 
was 9. Can you tell me the original number? 

In this case the answer is more or less yes: it must have 
been 3, except that if negative numbers are allowed, then 
another solution is -3. 

If we want to talk more formally, then we say that we 
have been examining the equation x 2 = 9, and have dis- 
covered that there are two solutions. This example raises 
three issues that appear again and again. 

• Does a given equation have any solutions? 

• If so, does it have exactly one solution? 

• What is the set in which solutions are required to 
live? 

The first two concerns are known as the existence and the 
uniqueness of solutions. The third does not seem partic- 
ularly interesting in the case of the equation x 2 = 9, but 
in more complicated cases, such as partial differential 
equations, it can be a subtle and important question. 
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To use more abstract language, suppose that / is a 
function [1.2 §2.2] and we are faced with a statement 
of the form f(x) = y. The direct question is to work 
out y given what x is. The inverse question is to work 
out x given what y is: this would be called solving the 
equation f(x) = y. Not surprisingly, questions about 
the solutions of an equation of this form are closely 
related to questions about the invertibility of the func- 
tion /, which were discussed in [1.2]. Because x and y 
can be very much more general objects than numbers, 
the notion of solving equations is itself very general, and 
for that reason it is central to mathematics. 

1.1 Linear Equations 

The very first equations a schoolchild meets will typi- 
cally be ones like 2x + 3 = 17. To solve simple equa- 
tions like this, one treats x as an unknown number that 
obeys the usual rules of arithmetic. By exploiting these 
rules one can transform the equation into something 
much simpler: subtracting 3 from both sides we learn 
that 2x = 14, and dividing both sides of this new equa- 
tion by 2 we then discover that x = 7. If we are very 
careful, we will notice that all we have shown is that if 
there is some number x such that 2x + 3 = 17 then x 
must be 7. What we have not shown is that there is any 
such x. So strictly speaking there is a further step of 
checking that 2x7+3 = 17. This will obviously be true 
here, but the corresponding assertion is not always true 
for more complicated equations so this final step can be 
important. 

The equation 2x + 3 = 17 is called “linear” because 
the function / we have performed on x (to multiply it 
by 2 and add 3) is a linear one, in the sense that its graph 
is a straight line. As we have just seen, linear equations 
involving a single unknown x are easy to solve, but mat- 
ters become considerably more sophisticated when one 
starts to deal with more than one unknown. Let us look 
at a typical example of an equation in two unknowns, the 
equation 3x + 2 y = 14. This equation has many solu- 
tions: for any choice of y you can set x = (14 - 2y)/3 
and you have a pair (x,y) that satisfies the equation. 
To make it harder, one can take a second equation as 
well, 5x + 3 y = 22, say, and try to solve the two equa- 
tions simultaneously. Then, it turns out, there is just one 
solution, namely x = 2 and y = 4. Typically, two lin- 
ear equations in two unknowns have exactly one solu- 
tion, just as these two do, which is easy to see if one 
thinks about the situation geometrically. An equation of 
the form ax + by = c is the equation of a straight line in 
the xy-plane. Two lines normally meet in a single point, 


the exceptions being when they are identical, in which 
case they meet in infinitely many points, or parallel but 
not identical, in which case they do not meet at all. 

If one has several equations in several unknowns, it 
can be conceptually simpler to think of them as one 
equation in one unknown. This sounds impossible, but 
it is perfectly possible if the new unknown is allowed 
to be a more complicated object. For example, the two 
equations 3x + 2 y = 14 and 5x + 3y = 22 can be rewrit- 
ten as the following single equation involving matrices 
and vectors: 

(* aem 

If we let A stand for the matrix, x for the unknown col- 
umn vector, and b for the known one, then this equation 
becomes simply Ax = b, which looks much less com- 
plicated, even if in faet all we have done is hidden the 
complication behind our notation. 

There is more to this process, however, than sweep- 
ing dirt under the carpet. While the simpler notation 
conceals many of the specific details of the problem, 
it also reveals very clearly what would otherwise be 
obscured: that we have a linear map from R 2 to R 2 and 
we want to know which vectors x, if any, map to the 
vector b. When faced with a particular set of simul- 
taneous equations, this reformulation does not make 
much difference— the calculations we have to do are 
the same— but when we wish to reason more generally, 
either direetly about simultaneous equations or about 
other problems where they arise, it is much easier to 
think about a matrix equation with a single unknown 
vector than about a collection of simultaneous equations 
in several unknown numbers. This phenomenon occurs 
throughout mathematics and is a major reason for the 
study of high-dimensional spaces. 

1.2 Polynomial Equations 

We have just discussed the generalization of linear equa- 
tions from one variable to several variables. Another 
direction in which one can generalize them is to think 
of linear funetions as polynomials of degree 1 and con- 
sider funetions of higher degree. At school, for example, 
one learns how to solve quadratic equations, such as 
x 2 - 7x + 12 = 0. More generally, a polynomial equation 
is one of the form 

a n x n + a n -\x n ~ x + • • ■ + «2X 2 + aix + ao = 0. 

To solve such an equation means to find a value of x 
for which the equation is true (or, better still, all such 
values). This may seem an obvious thing to say until one 
considers a very simple example such as the equation 
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x 2 - 2 = O, or equivalently x 2 = 2. The solution to this is, 
of course, x = ±-/2. What, though, is V2? It is defined to 
be the positive number that squares to 2, but it does not 
seem to be much of a “solution” to the equation x 2 = 2 
to say that x is plus or minus the positive number that 
squares to 2. Neither does it seem entirely satisfactory to 
say that x = 1.4142135 . . . , since this is just the begin- 
ning of a calculation that never fimshes and does not 
result in any discernible pattern. 

There are two lessons that can be drawn from this 
example. One is that what matters about an equation 
is often the existence and properties of solutions and 
not so much whether one can find a formula for them. 
Although we do not appear to learn anything when we 
are told that the solutions to the equation x 2 = 2 are 
x = ifcs/2, this assertion does contain within it a faet that 
is not wholly obvious: that the number 2 has a square 
root. This is usually presented as a consequence of the 
intermediate value theorem (or another result of a sim- 
ilar nature), which States that if / is a continuous real- 
valued funetion and f(a ) and f(b) lie on either side of 
0, then somewhere between a and b there must be a c 
such that /(c) = 0. This result can be applied to the 
funetion /(x) = x 2 - 2, since /(l) = -1 and/(2) = 2. 
Therefore, there is some x between 1 and 2 such that 
x 2 - 2 = 0, that is, x 2 = 2. For many purposes, the mere 
existence of this x is enough, together with its defining 
properties of being positive and squaring to 2. 

A similar argument tells us that all positive real num- 
bers have positive square roots. But the picture changes 
when we try to solve more complicated quadratic equa- 
tions. Then we have two choices. Consider, for exam- 
ple, the equation x 2 - 6x + 7 = 0. We could note that 
x 2 - 6x + 7 is -1 when x = 4 and 2 when x = 5 and 
deduce from the intermediate value theorem that the 
equation has some solution between 4 and 5. However, 
we do not learn as much from this as if we complete the 
square, rewriting x 2 - 6x + 7 as (x - 3) 2 - 2. This allows 
us to rewrite the equation as (x - 3) 2 = 2, which has the 
two solutions x = 3 ± -J2. We have already established 
that V 2 exists and lies between 1 and 2, so not only do we 
have a solution of x 2 - 6x + 7 = 0 that lies between 4 and 
5, but we can see that it is closely related to, indeed built 
out of, the solution to the equation x 2 = 2. This demon- 
strates a second important aspect of equation solving, 
which is that in many distances the explicit solubility of 
an equation is a relative notion. If we are given a solution 
to the equation x 2 = 2, we do not need any new input 
from the intermediate value theorem to solve the more 
complicated equation x 2 - 6x + 7 = 0: all we need is 
some algebra. The solution, x = 3 ± -J2, is given by an 


explicit expression, but inside that expression we have 
y/2, which is not defined by means of an explicit formula 
but as a real number, with certain properties, that we can 
prove to exist. 

Solving polynomial equations of higher degree is 
markedly more difficult than solving quadratics, and 
raises fascinating questions. In particular, there are com- 
plicated formulas for the solutions of cubic and quartic 
equations, but the problem of finding corresponding for- 
mulas for quintic and higher-degree equations became 
one of the most famous unsolved problems in mathe- 
matics, until abel [VI.32] and galois [VI.40] showed that 
it could not be done. For more details about these mat- 
ters see THE INSOLUBILITY OF THE QUINTIC [V.24]. For 
another article related to polynomial equations see the 
FUNDAMENTAL THEOREM OF ALGEBRA [V.15]. 

1.3 Polynomial Equations in Several Variables 

Suppose that we are faced with an equation such as 
x 3 +y 3 + z 3 = 3 x 2 y + 3 y 2 z + 6 xyz. 

We can see straight away that there will be many solu- 
tions: if you fix x and y, then the equation is a cubic 
polynomial in z, and all cubics have at least one (real) 
solution. Therefore, for every choice of x and y there is 
some z such that the triple (x, y, z) is a solution of the 
above equation. 

Because the formula for the solution of a general cubic 
equation is rather complicated, a precise specification of 
the set of all triples (x, y, z) that solve the equation may 
not be very enlightening. However, one can learn a lot by 
regarding this solution set as a geometric object— a two- 
dimensional surface in space, to be precise— and to ask 
qualitative questions about it. One might, for distance, 
wish to understand roughly what shape it is. Questions 
of this kind can be made precise using the language and 
concepts of topology [1.3 §6.4]. 

One can of course generalize further and consider 
simultaneous solutions to several polynomial equa- 
tions. Understanding the solution sets of such systems 
of equations is the province of algebraic geometry 
[IV. 7]. 

1.4 Diophantine Equations 

As has been mentioned, the answer to the question 
of whether a particular equation has a solution varies 
according to where the solution is allowed to be. The 
equation x 2 + 3 = 0 has no solution if x is required to 
be real, but in the complex numbers it has the two solu- 
tions x = ±i v /- 3- The equation x 2 + y 2 = 11 has infinitely 
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many solutions if we are looking for x and y in the real 
numbers, but none if they have to be integers. 

This last example is a typical Diophantine equation, 
the name given to an equation if one is looking for inte- 
ger solutions. The most famous Diophantine equation 
is the Fermat equation x n + y n = Z n , which is now 
known, thanks to Andrew Wiles, to have no positive inte- 
ger solutions if n is greater than 2. (See fermat’s last 
theorem [V. 12]. By contrast, the equation x 2 + y 2 = z 2 
has infimtely many solutions.) A great deal of modern 
algebraic number theory [IV. 3] is concerned with Dio- 
phantine equations, either directly or indirectly. As with 
equations in the real and complex numbers, it is often 
fruitful to study the structure of sets of solutions to 
Diophantine equations: this investigation belongs to the 
area known as arithmetic geometry [IV.6]. 

A notable feature of Diophantine equations is that 
they tend to be extremely difficult. It is therefore natural 
to wonder whether there couldbe a systema tic approach 
to them. This question was the tenth in a famous list of 
problems asked by hilbert [VI.62] in 1900. It was not 
until 1970 that Yuri Matiyasevitch, budding on work by 
Martin Davis, Julia Robinson, and Hilary Putnam, proved 
that the answer was no. (This is discussed further in the 
INSOLUBILITY OF THE HALTING PROBLEM [V.23].) 

An important step in the solution was taken in 1936, 
by Church and turing [VI.92]. This was to make precise 
the notion of a “systematic approach,” by formalizing 
(in two different ways) the notion of an algorithm (see 
ALGORITHMS [II.4 §3] and COMPUTATIONAL COMPLEXITY 
[IV.21 §1]). It was not easy to do this in the pre-computer 
age, but now we can restate the solution of HUbert's 
tenth problem as fodows: there is no computer program 
that can take as its input any Diophantine equation, and 
without fad print “YES” if it has a solution and “NO” 
otherwise. 

What does this ted us about Diophantine equations? 
We can no longer dream of a final theory that will encom- 
pass them ad, so instead we are forced to restrict our 
attention to individual equations or special classes of 
equations, continuady developing different methods for 
solving them. This would make them uninteresting after 
the first few, were it not for the faet that specific Dio- 
phantine equations have remarkable links with very gen- 
eral questions in other parts of mathematics. For exam- 
ple, equations of the form y 2 = /(x), where /(x) is 
a cubic polynomial in x, may look rather special, but 
in faet the elliptic curves [Id.21] that they define are 
central to modern number theory, including the proof of 
Fermat’s last theorem. Of course, Fermat’s last theorem 
is itself a Diophantine equation, but its study has led to 


major developments in other parts of number theory. 
The correct moral to draw is perhaps this: solving a par- 
ticular Diophantine equation is fascinating and worth- 
while if, as is often the case, the result is more than a 
mere addition to the list of equations that have been 
solved. 

1.5 Differential Equations 

So far, we have looked at equations where the unknown 
is either a number or a point in n-dimensional space 
(that is, a sequence of n numbers). To generate 
these equations, we took various combinations of the 
basic arithmetical operations and applied them to our 
unknowns. 

Here, for comparison, are two well-known differential 
equations, the first “ordinary” and the second “partial”: 



3 t = /3^r d 2 ^ 3 2 r \ 

dt \3x 2 + dy 2 + dz 2 )' 

The first is the equation for simple harmonic motion, 
which has the general solution x(t) = A sin kt +B cos kt\ 
the second is the heat equation, which was discussed 

in SOME FUNDAMENTAL MATHEMATICAL DEFINITIONS 

[1-3 §5.4], 

For many reasons, differential equations represent a 
jump in sophistication. One is that the unknowns are 
funetions, which are mueh more complicated objects 
than numbers or n-dimensional points. (For example, 
the first equation above asks what funetion x of t has 
the property that if you differentiate it twice then you 
get -k 2 times the original funetion.) A second is that 
the basic operations one performs on funetions include 
differentiation and integration, which are considerably 
less “basic” than addition and multiplication. A third is 
that differential equations that can be solved in “closed 
form,” that is, by means of a formula for the unknown 
funetion /, are the exception rather than the rule, even 
when the equations are natural and important. 

Consider again the first equation above. Suppose that, 
given a funetion /, we write </>(/) for the funetion 
(d 2 //dt 2 ) + k 2 f. Then </> is a linear map, in the sense 
that 4>(f + g) = 4>(f) + 4>(g) and 4>(af) = a<t>(f) for 
any constant a. This means that the differential equa- 
tion can be regarded as something like a matrix equa- 
tion, but generalized to infimtely many dimensions. The 
heat equation has the same property: if we define < p(T) 
to be 

3 T_ fd 2 jr 3^r 3 2 t \ 

dt v3x 2 + dy 2 + dz 2 )’ 
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then ip is another linear map. Such differential equations 
are called linear, and the link with linear algebra makes 
them markedly easier to solve. (A very useful tool for 
this is THE FOURIER TRANSFORM [III.27].) 

What about the more typical equations, the ones that 
cannot be solved in closed form? Then the focus shifts 
once again toward estabhshing whether or not solutions 
exist, and if so what properties they have. As with poly- 
nomial equations, this can depend on what you count 
as an allowable solution. Sometimes we are in the posi- 
tion we were in with the equation x 2 = 2: it is not too 
hard to prove that solutions exist and all that is left 
to do is name them. A simple example is the equation 
dy/dx = e~ x . In a certain sense, this cannot be solved: 
it can be shown that there is no function built out of 
polynomials, exponentials [III.25], and trigonomet- 
ric functions [III.93] that differentiates to e _x \ How- 
ever, in another sense the equation is easy to solve— 
all you have to do is integrate the function e~ x . The 
resulting function (when divided by v"'2tt) is the normal 
distribution [III.73 §5] function. The normal distribu- 
tion is of fundamental importance in probability, so the 
function is given a name, 

In most situations, there is no hope of writing down 
a formula for a solution, even if one allows oneself to 
integrate “known” functions. A famous example is the 
so-called three-body problem [V.36]: given three bod- 
ies moving in space and attracted to each other by grav- 
itational forces, how will they continue to move? Using 
Newton’s laws, one can write down some differential 
equations that describe this situation, newton [VI. 13] 
solved the corresponding equations for two bodies, and 
thereby explained why planets move in elliptical orbits 
around the Sun, but for three or more bodies they proved 
very hard indeed to solve. It is now known that there 
was a good reason for this: the equations can lead to 
chaotic behavior (see Dynamics [IV.15] for more about 
chaos). However, this opens up a new and very inter- 
esting avenue of research into questions of chaos and 
stability. 

Sometimes there are ways of proving that solutions 
exist even if they cannot be easily specihed. Then 
one may ask not for precise formulas, but for general 
descriptions. For example, if the equation has a time 
dependence (as, for instance, the heat equation and wave 
equations have), one can ask whether solutions tend 
to decay over time, or blow up, or remain roughly the 
same. These more qualitative questions concern what is 
known as asymptotic behavior, and there are techniques 
for answering some of them even when a solution is not 
given by a tidy formula. 


As withDiophantine equations, there are some special 
and important classes of partial differential equations, 
including nonlinear ones, that can be solved exactly. 
This gives rise to a very different style of research: again 
one is interested in properties of solutions, but now 
these properties may be more algebraic in nature, in the 
sense that exact formulas will play a more important 
role. See linear and nonlinear waves and solitons 
[m.51]. 

2 Classifying 

If one is trying to understand a new mathematical struc- 
ture, such as a group [1.3 §2.1] or a manifold [1.3 §6.9], 
one of the first tasks is to come up with a good sup- 
ply of examples. Sometimes examples are very easy to 
find, in which case there may be a bewildering array of 
them that cannot be put into any sort of order. Often, 
however, the conditions that an example must satisfy 
are quite stringent, and then it may be possible to come 
up with something like an infinite list that includes every 
single one. For example, it canbe shown that any vector 
space [1.3 §2.3] of dimension n over a held F is isomor- 
phic to F n . This means that just one positive integer, n, 
is enough to determine the space completely. In this case 

our “list” willbe {0},F,F 2 ,F 3 ,F 4 In such a situation 

we say that we have a classification of the mathematical 
structure in question. 

Classihcations are very useful because if we can clas- 
sify a mathematical structure then we have a new way of 
proving results about that structure: instead of deducing 
a result from the axioms that the structure is required 
to satisfy, we can simply check that it holds for every 
example on the list, conhdent in the knowledge that we 
have thereby proved it in general. This is not always eas- 
ier than the more abstract, axiomatic approach, but it 
certainly is sometimes. Indeed, there are several results 
proved using classifications that nobody knows how to 
prove in any other way. More generally, the more exam- 
ples you know of a mathematical structure, the easier 
it is to think about that structure— testing hypotheses, 
finding counterexamples, and so on. If you know all the 
examples of the structure, then for some purposes your 
understanding is complete. 

2.1 Identifying Building Blocks and Families 

There are two situations that typically lead to interesting 
classification theorems. The boundary between them is 
somewhat blurred, but the distinction is clear enough to 
be worth making, so we shall discuss them separately in 
this subsection and the next. 



50 


I. Introduction 


As an example of the first kind of situation, let us 
look at objects called regular polytopes. Polytopes are 
polygons, polyhedra, and their higher-dimensional gen- 
eralizations. The regular polygons are those for which 
all sides have the same length and all angles are equal, 
and the regular polyhedra are those for which all faces 
are congruent regular polygons and every vertex has the 
same number of edges coming out of it. More generally, 
a higher-dimensional polytope is regular if it is as sym- 
metrical as possible, though the precise definition of this 
is somewhat complicated. (Here, in three dimensions, is 
a definition that turns out to be equivalent to the one just 
given but easier to generalize. A flag is a triple (v,e,f) 
where v is a vertex of the polyhedron, e is an edge con- 
taining v, and / is a face containing e. A polyhedron is 
regular if for any two flags (v,e,f) and (v',e',f) there 
is a symmetry of the polyhedron that takes v to v' , e to 
e', and /to /'.) 

It is easy to see what the regular polygons are in two 
dimensions: for every k greater than 2 there is exactly 
one regular fc-gon and that is all there is. In three dimen- 
sions, the regular polyhedra are the famous Platonic 
solids, that is, the tetrahedron, the cube, the octahedron, 
the dodecahedron, and the icosahedron. It is not too 
hard to see that there cannot be any more regular poly- 
hedra, since there must be at least three faces meeting 
at every vertex, and the angles at that vertex must add 
up to less than 360°. This constraint means that the only 
possibilities for the faces at a vertex are three, four, or 
five triangles, three squares, or three pentagons. These 
give the tetrahedron, the octahedron, the icosahedron, 
the cube, and the dodecahedron, respectively. 

Some of the polygons and polyhedra just defined have 
natural higher-dimensional analogues. For example, if 
you take n + 1 points in R n all at the same distance 
from one another, then they form the vertices of a reg- 
ular simplex, which is an equilateral triangle or regu- 
lar tetrahedron when n = 2 or 3. The set of all points 

(xi,X2 x n ) with 0 ^ xi < 1 for every i forms 

the n-dimensional analogue of a unit square or cube. 
The octahedron can be defined as the set of all points 
(x,y,z) in R 3 such that \x\ + \y\ + \z\ < 1, and the 
analogue of this in n dimensions is the set of all points 
(xi,X2 x n ) such that \xi\ + ■ • ■ + \x n \ ^ 1. 

It is not obvious how the dodecahedron and icosa- 
hedron would lead to infinite families of regular poly- 
topes, and it turns out that they do not. In faet, apart 
from three more examples in four dimensions, the above 
polytopes constitute a complete list. These three exam- 
ples are quite remarkable. One of them has 120 “three- 
dimensional faces,” each of which is a regular dodec- 


ahedron. It has a so-called dual, which has 600 regu- 
lar tetrahedra as its “faces.” The third example can be 
described in terms of coordinates: its vertices are the six- 
teen points of the form (±|/#1, ±1, ±1), together with 
the eight points (±2, 0,0,0), (0,±2,0,0), (0,0, ±2,0), 
and (0,0,0, ±2). 

The theorem that these are all the regular polytopes 
is significantly harder to prove than the result sketched 
above for three dimensions. The complete list was 
obtained by Schåfli in the mid nineteenth century; the 
first proof that there are no others was given by Donald 
Coxeter in 1969. 

We therefore know that the regular polytopes 
in dimensions three and higher fall into three 
families— the n-dimensional versions of the tetra- 
hedron, cube, and octahedron— together with five 
“exceptional” examples— the dodecahedron, the icosa- 
hedron, and the three four-dimensional polytopes just 
described. This situation is typical of many classification 
theorems. The exceptional examples, often called “spo- 
radic,” tend to have a very high degree of symmetry— it 
is almost as if we have no right to expect this degree 
of symmetry to be possible, but just occasionally by a 
happy chance it is. The families and sporadic examples 
that occur in different classification results are often 
closely related, and this can be a sign of deep connec- 
tions between areas that do not at first appear to be 
connected at all. 

Sometimes one does not try to classify all mathemat- 
ical structures of a given kind, but instead identifies a 
certain class of “basic” structures out of which all the 
others can be built in a simple way. A good analogy for 
this is the set of primes, out of which all other integers 
can be built as products. Finite groups, for example, are 
all “products” of certain basic groups that are called sim- 
ple. THE CLASSIFICATION OF FINITE SIMPLE GROUPS [V.8], 
one of the most famous theorems of twentieth-century 
mathematics, is discussed in part V. 

For more on this style of classification theorem, see 
also LIE THEORY [III.50]. 

2.2 Equivalence, Nonequivalence, and Invariants 

There are many situations in mathematics where two 
objects are, strietly speaking, different, but where we 
are not interested in the difference. In such situations 
we want to regard the objects as “essentially the same,” 
or “equivalent.” Equivalence of this kind is expressed 
formally by the notion of an equivalence relation 
[1.2 §2.3], 

For example, a topologist regards two shapes as essen- 
tially the same if one is a continuous deformation of 
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the other, as we saw in [1.3 §6.4]. As pointed out there, a 
sphere is the same as a cube in this sense, and one can 
also see that the surface of a doughnut, that is, a torus, 
is essentially the same as the surface of a teacup. (To 
turn the teacup into a doughnut, let the handle expand 
while the cup part is gradually swallowed up into it.) It is 
equally obvious, intuitively speaking, that a sphere is not 
essentially the same as a torus, but this is much harder 
to prove. 

Why should nonequivalence be harder to prove than 
equivalence? The answer is that in order to show that 
two objects are equivalent, all one has to do is find 
a single transformation that demonstrates this equiva- 
lence. However, to show that two objects are not equiv- 
alent, one must somehow consider all possible transfor- 
mations and show that not one of them works. How can 
one rule out the existence of some wildly complicated 
continuous deformation that is impossible to visualize 
but happens, remarkably, to turn a sphere into a torus? 

Here is a sketch of a proof. The sphere and the torus 
are examples of compact orientable surfaces, which 
means, roughly speaking, two-dimensional shapes that 
occupy a finite portion of space and have no boundary. 
Given any such surface, one can find an equivalent sur- 
face that is built out of triangles and is topologically the 
same. Here is a famous theorem of euler [VI. 18]. 

LetP be a polyhedron that is topologically the same as a 
sphere, and suppose that it has V vertices, E edges, and 
F faces. Then V - E + F = 2. 

For example, if P is an icosahedron, then it has twelve 
vertices, thirty edges, and twenty faces, and 12-30 + 20 
is indeed equal to 2. 

For this theorem, it is not in faet important that the tri- 
angles are flat: we can draw them on the original sphere, 
except that now they are spherical triangles. It is just 
as easy to count vertices, edges, and faces when we do 
this, and the theorem is still valid. A network of trian- 
gles drawn on a sphere is called a triangulation of the 
sphere. 

Euler's theorem tellsus that V-E+F = 2 regardless of 
what triangulation of the sphere we take. Moreover, the 
formula is still valid if the surface we triangulate is not a 
sphere but another shape that is topologically equivalent 
to the sphere, since triangulations can be continuously 
deformed without V, E, or F changing. 

More generally, one can triangulate any surface, and 
evaluate V-E + F. The result is called the Euler num- 
ber of that surface. For this definition to make sense, 
we need the following faet, which is a generalization of 
Euler’s theorem (and which is not much harder to prove 
than the original result). 


(i) Although a surface can be triangulated in many 
ways, the quantity V-E + F will be the same for 
all triangulations. 

If we continuously deform the surface and continuously 
deform one of its triangulations at the same time, we 
can deduce that the Euler number of the new surface is 
the same as that of the old one. In other words, faet (i) 
above has the following interesting consequence. 

(ii) If two surfaces are continuous deformations ofeach 
other, then they have the same Euler number. 

This gives us a potential method for showing that sur- 
faces are not equivalent: if they have different Euler 
numbers then we know from the above that they are 
not continuous deformations of each other. The Euler 
number of the torus turns out to be 0 (as one can show 
by calculating V-E + F for any triangulation), and that 
completes the proof that the sphere and the torus are 
not equivalent. 

The Euler number is an example of an invariant. This 
means a funetion </>, the domain of which is the set of 
all objects of the kind one is studying, with the prop- 
erty that if X and Y are equivalent objects, then <f>(X) = 
4>(Y). To show that X is not equivalent to Y, it is enough 
to find an invariant <fi for which <f(X) and <p(Y) are 
different. Sometimes the values </> takes are numbers 
(as with the Euler number), but often they will be more 
complicated objects such as polynomials or groups. 

It is perfeetly possible for 4>(X) to equal </>(Y) even 
when X and Y are not equivalent. An extreme example 
would be the invariant <p that simply took the value 0 
for every object X. However, sometimes it is so hard 
to prove that objects are not equivalent that invariants 
can be considered useful and interesting even when they 
work only part of the time. 

There are two main properties that one looks for in 
an invariant <f>, and they tend to pull in opposite direc- 
tions. One is that it should be as fine as possible: that 
is, as often as possible <p(X) and 4>(Y) are different if X 
and Y are not equivalent. The other is that as often as 
possible one should actually be able to establish when 
4>(X) is different from <p(Y). There is not much use in 
having a fine invariant if it is impossible to calculate. 
(An extreme example would be the “trivial” invariant 
that simply mapped each X to its equivalence class. It 
is as fine as possible, but unless we have some indepen- 
dent means of specifying it, then it does not represent 
an advance on the original problem of showing that two 
objects are not equivalent.) The most powerful invari- 
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ants therefore tend to be ones that can be calculated, 
but not very easily. 

In the case of compact orientable surfaces, we are 
lucky: not only is the Euler number an invariant that is 
easy to calculate, but it also classifies the compact ori- 
entable surfaces completely. To be precise, k is the Euler 
number of a compact orientable surface if and only if it 
is of the form 2-2 g for some nonnegative integer g (so 
the possible Euler numbers are 2, 0, - 2, -4, . . . ), and two 
compact orientable surfaces with the same Euler number 
are equivalent. Thus, if we regard equivalent surfaces as 
the same, then the number g gives us a complete speci- 
fication of a surface. It is called the genus of the surface, 
and can be interpreted geometrically as the number of 
“holes” the surface has (so the genus of the sphere is 0 
and that of the torus is 1). 

For other examples of invariants, see algebraic 
topology [IV. 10] and knot polynomials [in.46]. 

3 Generalizing 

When an important mathematical definition is formu- 
lated, or theorem proved, that is rarely the end of the 
story. However clear a piece of mathematics may seem, 
it is nearly always possible to understand it better, and 
one of the most common ways of doing so is to present 
it as a special case of something more general. There 
are various different kinds of generalization, of which 
we discuss a few here. 

3.1 Weakening Hypotheses and Strengthening 
Conclusions 

The number 1729 is famous for being expressible as the 
sum of two cubes in two different ways: it is l 3 + 12 3 and 
also 9 3 + 10 3 . Let us now try to decide whether there is 
a number that can be written as the sum of four cubes 
in ten different ways. 

At first this problem seems alarmingly difficult. It is 
clear that any such number, if it exists, must be very 
large and would be extremely tedious to find if we simply 
tested one number after another. So what can we do that 
is better than this? 

The answer turns out to be that we should weaken 
our hypotheses. The problem we wish to solve is of 
the following general kind. We are given a sequence 
ai , «2, «3, ■ - ■ of positive integers and we are told that it 
has a certain property. We must then prove that there 
is a positive integer that can be written as a sum of 
four terms of the sequence in ten different ways. This 
is perhaps an artificial way of thinking about the prob- 
lem since the property we assume of the sequence is the 


property of “being the sequence of cubes,” which is so 
specific that it is more natural to think of it as an Identi- 
fication of the sequence. However, this way of thinking 
encourages us to consider the possibility that the conclu- 
sion might be true for a much wider class of sequences. 
And indeed this turns out to be the case. 

There are a thousand cubes less than or equal to 
1 000 000 000. We shall now see that this property alone 
is sufficient to guarantee that there is a number that can 
be written as the sum of four cubes in ten different ways. 
That is, if ai, a2, «3, . . . is any sequence of positive inte- 
gers, and if none of the first thousand terms exceeds 
1 000 000 000, then some number can be written as the 
sum of four terms of the sequence in ten different ways. 

To prove this, all we have to do is notice that the num- 
ber of different ways of choosing four distinet terms 
from the sequence «i, «2, ■ ■ ■ , uiooo is 1000 x 999 x 998 x 
997/24, which is greater than 40 x 1000000000. The 
sum of any four terms of the sequence cannot exceed 
4x1 000 000 000. It follows that the average number of 
ways of writing one of the first 4 000 000 000 numbers 
as the sum of four terms of the sequence is at least ten. 
But if the average number of representations is at least 
ten, then there must certainly be numbers that have at 
least this number of representations. 

Why did it help to generalize the problem in this way? 
One might think that it would be harder to prove a result 
if one assumed less. However, that is often not true. The 
less you assume, the fewer options you have when try- 
ing to use your assumptions, and that can speed up the 
search for a proof. Had we not generalized the prob- 
lem above, we would have had too many options. For 
instance, we might have found ourselves trying to solve 
very difficult Diophantine equations involving cubes 
rather than noticing the easy counting argument. In a 
way, it was only once we had weakened our hypotheses 
that we understood the true nature of the problem. 

We could also think of the above generalization as a 
strengthening of the conclusion: the problem asks for 
a statement about cubes, and we prove not just that 
but much more besides. There is no clear distinetion 
between weakening hypotheses and strengthening con- 
clusions, since if we are asked to prove a statement of the 
form P => Q, we can always reformulate it as -iQ => ->P. 
Then, if we weaken P we are weakening the hypotheses 
ofP => Q but strengthening the conclusion of -'Q => -iP. 

3.2 Proving a More Abstract Result 

A famous result in modular arithmetic, known as Fer- 
mat’s little theorem (see modular arithmetic [III.60]), 
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States that if p is a prime and a is not a multiple of p, 
then a p_1 leaves a remainder of 1 when you divide by p. 
That is, a p_1 is congruent to 1 mod p. 

There are several proofs of this result, one of which 
is a good illustration of a certain kind of generalization. 
Here is the argument in outline. The first step is to show 
that the numbers 1, 2, . . . , p - 1 form a GROUP [1.3 §2.1] 
under multiplication mod p. (This means multiplication 
followed by taking the remainder on division by p. For 
example, if p = 7 then the “product” of 3 and 6 is 4, since 
4 is the remainder when you divide 18 by 7.) The next 
step is to note that ifl^a^p-1 then the powers of 
a (mod p) form a subgroup of this group. Moreover, the 
size of the subgroup is the smallest positive integer m 
such that a m is congruent to 1 mod p. One then applies 
Lagrange’s theorem, which States that the size of a group 
is always divisible by the size of any of its subgroups. 
In this case, the size of the group is p - 1, from which 
it follows that p - 1 is divisible by m. But then, since 
a m = %, it follows that a p ' = 1. 

This argument shows that Fermat's little theorem is, 
when viewed appropriately, just one special case of 
Lagrange’s theorem. (The word “just” is, however, a lit- 
tle misleading, because it is not wholly obvious that the 
integers mod p form a group in the way stated. This faet 
is proved using euclid’s algorithm [in.22].) 

Fermat could not have viewed his theorem in this way, 
since the concept of a group had not been invented when 
he proved it. Thus, the abstract concept of a group helps 
one to see Fermat’s little theorem in a completely new 
way: it can be viewed as a special case of a more general 
result, but a result that cannot even be stated until one 
has developed some new, abstract concepts. 

This process of abstraction has many benefits. Most 
obviously, it provides us with a more general theorem, 
one that has many other interesting particular cases. 
Once we see this, then we can prove the general result 
once and for all rather than having to prove each case 
separately. A related beneht is that it enables us to see 
connections between results that may originally have 
seemed quite different. And Ånding surprising connec- 
tions between different areas of mathematics almost 
always leads to signiheant advances in the subject. 

3.3 Identifying Characteristic Properties 

There is a marked contrast between the way one deAnes 
y/2 and the way one deAnes V- 1 , or i as it is usually 
written. In the former case one begins, if one is being 
careful, by proving that there is exaetly one positive real 
number that squares to 2. Then y/2 is deAned to be this 
number. 


This style of definition is impossible for i since there 
is no real number that squares to -1. So instead one 
asks the following question: if there were a number that 
squared to - 1 , what could one say about it? Such a num- 
ber would not be a real number, but that does not rule 
out the possibility of extending the real number system 
to a larger system that contains a square root of -1. 

At Arst it may seem as though we know precisely one 
thing about i: that i 2 = - 1. But if we assume in addition 
that i obeys the normal rules of arithmetic, then we can 
do more interesting calculations, such as 

(i + l) 2 = i 2 + 21 + 1 = -1 + 2i + 1 = 2i, 
which implies that (i + l)/\/2 is a square root of i. 

From these two simple assumptions— that i 2 = -1 
and that i obeys the usual rules of arithmetic— we can 
develop the entire theory of complex numbers [1.3 §1.5] 
without ever having to worry about what i actually is. 
And in faet, once you stop to think about it, the exis- 
tence of y/2, though reassuring, is not in practice any- 
thing like as important as its deAning properties, which 
are very similar to those of i: it squares to 2 and obeys 
the usual rules of arithmetic. 

Many important mathematical generalizations work 
in a similar way. Another example is the definition of 
x a when x and a are real numbers with x positive. It 
is difAcult to make sense of this expression in a direct 
way unless a is a positive integer, and yet mathemati- 
cians are completely comfortable with it, whatever the 
value of a. How can this be? The answer is that what 
really matters about x a is not its numerical value but its 
characteristic properties when one thinks of it as a func- 
tion of a. The most important of these is the property 
that x a+b = x a x b . Together with a couple of other sim- 
ple properties, this completely determines the funetion 
x a . More importantly, it is these characteristic proper- 
ties that one uses when reasoning about x a . This exam- 
ple is discussed in more detail in the exponential and 
LOGARITHMIC FUNCTIONS [III.25]. 

There is an interesting relationship between abstrac- 
tion and classiAcation. The word “abstract" is often used 
to refer to a part of mathematics where it is more com- 
mon to use characteristic properties of an object than it 
is to argue direetly from a definition of the object itself 
(though, as the example of y/2 shows, this distinetion 
can be somewhat hazy). The ultimate in abstraction is to 
explore the consequences of a system of axioms, such as 
those for a group or avector space. However, sometimes, 
in order to reason about such algebraic structures, it is 
very helpful to classify them, and the result of classiAca- 
tion is to make them more concrete again. For instance, 
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every finite-dimensional real vector space V is isomor- 
phic to M n for some nonnegative integer n, and it is 
sometimes helpful to think of V as the concrete object 
R M , rather than as an algebraic structure that satisfies 
certain axioms. Thus, in a certain sense, classification is 
the opposite of abstraction. 

3.4 Generalization after Reformulation 

Dimension is a mathematical idea that is also a famil- 
iar part of everyday language: for example, we say that 
a photograph of a chair is a two-dimensional represen- 
tation of a three-dimensional object, because the chair 
has height, breadth, and depth, but the image just has 
height and breadth. Roughly speaking, the dimension of 
a shape is the number of independent directions one can 
move about in while staying inside the shape, and this 
rough conception can be made mathematically precise 
(using the notion of a vector space [1.3 §2.3]). 

If we are given any shape, then its dimension, as one 
would normally understand it, must be a nonnegative 
integer: it does not make much sense to say that one can 
move about in 1.4 independent directions, for example. 
And yet there is a rigorous mathematical theory of frac- 
tional dimension, in which for every nonnegative real 
number d you can find many shapes of dimension d. 

How do mathematicians achieve the seemingly impos- 
sible? The answer is that they reformulate the concept of 
dimension and only then do they generalize it. What this 
means is that they give a new definition of dimension 
with the following two properties. 

(i) For all “simple” shapes the new definition agrees 
with the old one. For example, under the new defi- 
nition a line will still be one dimensional, a square 
two dimensional, and a cube three dimensional. 

(ii) With the new definition it is no longer obvious that 
the dimension of every shape must be a positive 
integer. 

There are several ways of doing this, but most of them 
focus on the differences between length, area, and vol- 
ume. Notice that a line segment of length 2 can be 
expressed as a union of two nonoverlapping line seg- 
ments of length 1, a square of side-length 2 can be 
expressed as a union of four nonoverlapping squares 
of side-length 1, and a cube of side-length 2 can be 
expressed as a union of eight nonoverlapping cubes of 
side-length 1. It is because of this that if you enlarge a d- 
dimensional shape by a factor r, then its d-dimensional 
“volume” is multiplied by r d . Now suppose that you 
would like to exhibit a shape of dimension 1.4. One way 


of doing it is to let r = 2 5/7 , so that r 1A = 2, and find 
a shape X such that if you expand X by a factor of r, 
then the expanded shape canbe expressed as a union of 
two disjoint copies of X. Two copies of X ought to have 
twice the “volume” of X itself, so the dimension d of X 
ought to satisfy the equation r d = 2. By our choice of 
r, this tells us that the dimension of X is 1.4. For more 
details, see dimension [III. 17]. 

Another concept that seems at first to make no sense 
is noncommutative geometry. The word “commutative” 
applies to binary operations [1.2 §2.4] and therefore 
belongs to algebra rather than geometry, so what could 
“noncommutative geometry” possibly mean? 

By now the answer should not be a surprise: one refor- 
mulates part of geometry in terms of a certain algebraic 
structure and then generalizes the algebra. The algebraic 
structure involves a commutative binary operation, so 
one can generalize the algebra by allowing the binary 
operation not to be commutative. 

The part of geometry in question is the study of man- 
ifolds [1.3 §6.9]. Associated with a manifold X is the 
set C(X) of all continuous complex-valued functions 
defined on X. Given two functions /, g in C(X), and 
two complex numbers A and /u, the linear combination 
hf +pg is another continuous complex-valued function, 
so it also belongs to C(X). Therefore, C(X) is a vec- 
tor space. However, one can also multiply f and g to 
form the continuous function fg (defined by ( fg) (x) = 
f(x)g(x)). This multiplication has various natural prop- 
erties (for instance, f(g + h) = fh+gh for all functions 
/, g, and h) that make C(X) into an algebra, and even a 
C* -algebra [IV.19 §3]. It turns out that a great deal of 
the geometry of a compact manifold X can be reformu- 
lated purely in terms of the corresponding C* -algebra 
C(X). The word “purely" here means that it is not nec- 
essary to refer to the manifold X in terms of which the 
algebra C (X) was originally defined— all one uses is the 
faet that C(X) is an algebra. This raises the possibil- 
ity that there might be algebras that do not arise geo- 
metrically, but to which the reformulated geometrical 
concepts nevertheless apply. 

An algebra has two binary operations: addition and 
multiplication. Addition is always assumed to be com- 
mutative, but multiplication is not: when multiplication 
is commutative as well, one says that the algebra is com- 
mutative. Since fg and gf are clearly the same func- 
tion, the algebra C(X) is a commutative C* -algebra, so 
the algebras that arise geometrically are always commu- 
tative. However, many geometrical concepts, once they 
have been reformulated in algebraic terms, continue to 
make sense for noncommutative C* -algebras, and that 
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is why the phrase “noncommutative" geometry is used. 
For more details, see operator algebras [IV.19 §5]. 

This process of reformulating and then generahzing 
underhes many of the most important advances in math- 
ematics. Let us briefly look at a third example. the fun- 
damental THEOREM OF ARITHMETIC [V.16] is, as itS 
name suggests, one of the foundation stones of num- 
ber theory: it States that every positive integer can be 
written in exactly one way as a product of prime num- 
bers. Flowever, number theorists hke to look at enlarged 
number systems, and for most of these the obvious ana- 
logue of the fundamental theorem of arithmetic is no 
longer true. For example, in the ring [m.82 §1] of num- 
bers of the form a + by'- 5 (where a and b are required 
to be integers), the number 6 can be written either as 
2 x 3 or as (1+V~5) X (1 - V-5). Since none of the 
numbers 2, 3, 1 + ^/^5, or 1 - *J— 5 can be decomposed 
further, the number 6 has two genuinely different prime 
factorizations in this ring. 

There is, however, a natur al way of generahzing 
the concept of “number” to include ideal numbers 
[hl.82 §2] that allow one to prove a version of the fun- 
damental theorem of arithmetic in rings such as the one 
just defined. First, we must reformulate: we associate 
with each number y the set of allits multiples 5y, where 
5 belongs to the ring. This set, which is denoted (y), 
has the following closure property: if a and /5 belong to 
(y) and S and e are any two elements of the ring, then 
5a + eP belongs to (y). 

A subset of a ring with that closure property is called 
an ideal. If the ideal is of the form (y) for some number 
y, then it is called a principal ideal. However, there are 
ideals that are not principal, so we can think of the set 
of ideals as generahzing the set of elements of the orig- 
inal ring (once we have reformulated each element y as 
the principal ideal (y )). It turns out that there are nat- 
ural notions of addition and multiphcation that can be 
apphed to ideals. Moreover, it makes sense to define an 
ideal I to be “prime” if the only way of writing I as a prod- 
uct JK is if one of J and K is a “unit.” In this enlarged 
set, unique factorization turns out to hold. These con- 
cepts give us a very useful way to measure “the extent 
to which unique factorization fails” in the original ring. 
For more details, see algebraic numbers [IV.3 §7]. 

3.5 Higher Dimensions and Several Variables 

We have already seen that the study of polynomial equa- 
tions becomes much more complicated when one looks 
not just at single equations in one variable, but at sys- 
tems of equations in several variables. Similarly, we have 



Figure 1 The densest possible 
packing of circles in the plane. 


seen that partial differential equations [1.3 §5.4], 
which can be thought of as differential equations involv- 
ing several variables, are typicahy much more difficult to 
analyze than ordinary differential equations, that is, dif- 
ferential equations in just one variable. These are two 
notable examples of a process that has generated many 
of the most important problems and results in math- 
ematics, particularly over the last century or so: the 
process of generalization from one variable to several 
variables. 

Suppose one has an equation that involves three real 
variables, x, y, and z. It is often useful to think of 
the triple (x,y,z) as an object in its own right, rather 
than as a collection of three numbers. Furthermore, 
this object has a natural interpretation: it represents 
a point in three-dimensional space. This geometrical 
interpretation is important, and goes a long way toward 
explaining why extensions of definitions and theorems 
from one variable to several variables are so interest- 
ing. If we generalize a piece of algebra from one vari- 
able to several variables, we can also think of what we 
are doing as generahzing from a one-dimensional set- 
ting to a higher-dimensional setting. This idea leads 
to many links between algebra and geometry, allowing 
techniques from one area to be used to great effect in 
the other. 

4 Discovering Patterns 

Suppose that you wish to fill the plane as densely as 
possible with nonoverlapping circles of radius 1. How 
should you do it? This question is an example of a so- 
called packing problem. The answer is known, and it is 
what one might expect: you should arrange the circles 
so that their centers form a triangular lattice, as shown 
in figure 1. In three dimensions a similar result is true, 
but much harder to prove: until recently it was a famous 
open problem known as the Kepler conjecture. Several 
mathematicians wrongly claimed to have solved it, but in 
1998 a long and complicated solution, obtained with the 
help of a computer, was announced by Thomas Hales, 
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and although his solution has proved very hard to check, 
the consensus is that it is probably correct. 

Questions about packing of spheres can be asked in 
any number of dimensions, but they become harder and 
harder as the dimension increases. Indeed, it is likely 
that the best density for a ninety-seven-dimensional 
packing, say, will never be known. Experience with sim- 
ilar problems suggests that the best arrangement will 
almost certainly not have a simple structure such as one 
sees in two dimensions, so that the only method for Ånd- 
ing it wouldbe a “brute-force search” of some kind. How- 
ever, to search for the best possible complicated struc- 
ture is not feasible: even if one could somehow reduce 
the search to Anitely many possibilities, there would be 
far more of them than one could feasibly check. 

When a problem looks too difAcult to solve, one 
should not give up completely. A much more produc- 
tive reaction is to formulate related but more approach- 
able questions. In this case, instead of trying to discover 
the very best packing, one can simply see how dense a 
packing one can And. Here is a sketch of an argument 
that gives a goodish packing in n dimensions, when n is 
large. One begins by taking a maximal packing: that is, 
one simply picks sphere after sphere until it is no longer 
possible to pick another one without it overlapping one 
of the spheres already chosen. This means that, for at 
least one of the spheres we have chosen, the distance 
from its center to x is less than 2— otherwise we could 
take a unit sphere about x and it would not overlap any 
of the other spheres. Therefore, if we take all the spheres 
in the collection and expand them by a factor of 2, then 
we cover all of R n . Since expanding an n-dimensional 
sphere by a factor of 2 increases its (n-dimensional) vol- 
ume by a factor of 2", the proportion of R n covered by 
the unexpanded spheres must be at least 2~ n . 

Notice that in the above argument we learned nothing 
at all about the nature of the arrangements of spheres 
with density 2~ n . All we did was take a maximal packing, 
and that can be done in a very haphazard way. This is 
in marked contrast with the approach that worked in 
two dimensions, where we deAned a speciAc pattern of 
circles. 

This contrast pervades all of mathematics. For some 
problems, the best approach is to build a highly struc- 
tured pattern that does what you want, while for 
others— usually problems for which there is no hope of 
obtaining an exact answer — it is better to look for less 
speciAc arrangements. “Highly structured” in this con- 
text often means “possessing a high degree of symme- 
try.” 


The triangular lattice is a rather simple pattern, but 
some highly structured patterns are much more com- 
plicated, and much more of a surprise when they are 
discovered. A notable example occurs in packing prob- 
lems. By and large, the higher the dimension you are 
working in, the more difAcult it is to And good patterns, 
but an exception to this general rule occurs at twenty- 
four dimensions. Here, there is a remarkable construc- 
tion, known as the Leech lattice, which gives rise to a 
miraculously dense packing. Formally, a lattice in R n is 
a subset A with the following three properties. 

(i) If x and y belong to A, then so do x + y and x-y. 

(n) If x belongs to A, then x is isolated. That is, there is 
some d > 0 such that the distance between x and 
any other point of A is at least d. 

(in) A is not contained in any (n - 1) -dimensional sub- 
space of R n . 

A good example of a lattice is the set Z n of all points 
in R n with integer coordinates. If one is searching for a 
dense packing, then it is a good idea to look at lattices, 
since if you know that every nonzero point in a lattice 
has distance at least d from 0, then you know that any 
two points have distance at least d from each other. This 
is because the distance between x and y is the same as 
the distance between 0 and y - x, both of which lie in 
the lattice if x and y do. Thus, instead of having to look 
at the whole lattice, one can get away with looking at a 
small portion around 0. 

In twenty-four dimensions it can be shown that there 
is a lattice A with the following additional properties, 
and that it is unique, in the sense that any other lattice 
with those properties is just a rotation of the first one. 

(iv) There is a 24 x 24 matrix M with determinant 
[III. 15] equal to 1 such that A consists of all integer 
combinations of the columns of M. 

(v) If v is a point in A, then the square of the distance 
from 0 to v is an even integer. 

(vi) The nearest nonzero vector to 0 is at distance 2. 
Thus, the balls of radius 1 about the points in A 
form a packing of R 24 . 

The nearest nonzero vector is far from unique: in faet 
there are 196 560 of them, which is a remar kably large 
number considering that these points must all be at 
distance at least 2 from each other. 

The Feech lattice also has an extraordinary degree of 
symmetry. To be precise, it has 8 315 553 613 086 720 
000 rotational symmetries. (This number equals 2 22 ■ 3 9 • 
5 4 ■ 7 2 ■ 1 1 ■ 13 ■ 23.) If you take the quotient [1.3 §3.3] 
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of its symmetry group by the subgroup consisting of 
the identity and minus the identity, then you obtain the 
Conway group Coi , which is one of the famous sporadic 
simple groups [V.8]. The existence of so many symme- 
tries makes it easier still to determine the smallest dis- 
tance from 0 of any nonzero point of the lattice, since 
once you have checked one distance you have automat- 
ically checked lots of others (just as, in the triangular 
lattice, the six-fold rotational symmetry tells us that the 
distances from 0 to its six neighbors are all the same). 

These facts about the Leech lattice illustrate a gen- 
eral principle of mathematical research: often, if a math- 
ematical construction has one remarkable property, it 
will have others as well. In particular, a high degree of 
symmetry will often be related to other interesting fea- 
tures. So, although it is a surprise that the Leech lat- 
tice exists at all, it is not as surprising when one then 
discovers that it gives an extremely dense packing of 
R 24 . In faet, it was shown in 2004 by Henry Cohn and 
Abhinav Kumar that it gives the densest possible pack- 
ing of spheres in twenty-four-dimensional space, at least 
among all packings derived from lattices. It is probably 
the densest packing of any kind, but this has not yet 
been proved. 

5 Explaining Apparent Coincidences 

The largest of all the sporadic finite simple groups is 
called the Monster group. Its name is partly explained 
by the faet that it has 2 46 ■ 3 20 ■ 5 9 ■ 7 6 • li 2 ■ 13 3 ■ 17 ■ 
19 ■ 23 ■ 29 ■ 31 • 41 ■ 47 ■ 59 ■ 71 elements. How can one 
hope to understand a group of this size? 

One of the best ways is to show that it is a group 
of symmetries of some other mathematical object (see 
the article on representation theory [IV. 12] for mueh 
more on this theme), and the smaller that object is, the 
better. We have just seen that another large sporadic 
group, the Conway group Coi, is closely related to the 
symmetry group of the Leech lattice. Might there be a 
lattice that played a similar role for the Monster group? 

It is not hard to show that there will be at least some 
lattice that works, but more challenging is to find one 
of small dimension. It has been shown that the smallest 
possible dimension that can be used is 196 883. 

Now let us turn to a different branch of mathemat- 
ics. If you look at the article about algebraic numbers 
[IV.3 §8] you will see a definition of a funetion j(z), 
called the elliptic modular funetion, of central impor- 
tance in algebraic number theory. It is given as the sum 


of a series that starts 

j(z) = e~ 2lTlz + 744 + 196 884e 2lTiz 

+ 21493 760e 4niz + 864 299970e 67riz + ■ ■ ■ . 
Rather intriguingly, the coefhcient of e 2niz in this series 
is 196 884, one more than the smallest possible dimen- 
sion of a lattice that has the Monster group as its group 
of symmetries. 

It is not obvious how seriously we should take this 
observation, and when it was first made by John McKay 
opinions differed about it. Some believed that it was 
probably just a coincidence, since the two areas seemed 
to be so different and unconnected. Others took the atti- 
tude that the funetion j(z) and the Monster group are 
so important in their respective areas, and the number 
196 883 so large, that the surprising numerical faet was 
probably pointing to a deep connection that had not yet 
been uncovered. 

It turned out that the second view was correct. After 
studying the coefficients in the series for j(z), McKay 
and John Thompson were led to a conjecture that 
related them all (and not just 196 884) to the Mon- 
ster group. This conjecture was extended by John Con- 
way and Simon Norton, who formulated the “Monstrous 
moonshine” conjecture, which was eventually proved 
by Richard Borcherds in 1992. (The word “moonshine” 
reflects the initial disbelief that there would be a seri- 
ous relationship between the Monster group and the 
j -funetion.) 

In order to prove the conjecture, Borcherds introduced 
a new algebraic structure, which he called a vertex 
algebra [IV. 13]. And to analyze vertex algebras, he used 
results from string theory [IV.13 §2]. In other words, 
he explained the connection between two very different - 
looking areas of pure mathematics with the help of 
concepts from theoretical physics. 

This example demonstrates in an extreme way another 
general principle of mathematical research: if you can 
obtain the same series of numbers (or the same structure 
of a more general kind) from two different mathematical 
sources, then those sources are probably not as differ- 
ent as they seem. Moreover, if you can find one deep 
connection, you will probably be led to others. There 
are many other examples where two completely differ- 
ent calculations give the same answer, and many of them 
remam unexplained. This phenomenon results in some 
of the most difficult and fascinating unsolved prob- 
lems in mathematics. (See the introduction to mirror 
symmetry [IV. 14] for another example.) 

Interestingly, the j-funetion leads to a second famous 
mathematical “coincidence.” There may not seem to be 
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anything special about the number e 77 ' 3163 , but here is 
the beginning of its decimal expansion: 

ø7tV163 

= 262 537412 640 768 743.99999999999925 
which is astonishingly close to an integer. Again it is 
initially tempting to dismiss this as a coincidence, but 
one should think twice before yielding to the temptation. 
After all, there are not all that many numbers that can 
be defined as simply as e 77 ' 3163 , and each one has a prob- 
ability of less than one in a million million of being as 
close to an integer as e 77 ' 3163 is. In faet, it is not a coinci- 
dence at all: for an explanation see algebraic numbers 
[IV. 3 §8], 

6 Counting and Measuring 

How many rotational symmetries are there of a regular 
icosahedron? Here is one way to work it out. Choose a 
vertex v of the icosahedron and let v ' be one of its neigh- 
bors. An icosahedron has twelve vertices, so there are 
twelve places where v could end up after the rotation. 
Once we know where v goes, there are five possibilities 
for v' (since each vertex has five neighbors and v' must 
still be a neighbor of v after the rotation). Once we have 
determined where v and v' go, there is no further choice 
we can make, so the number of rotational symmetries is 
5x12 = 60. 

This is a simple example of a counting argument, that 
is, an answer to a question that begins “How many.” 
However, the word “argument” is at least as important as 
the word “counting,” since we do not put all the symme- 
tries in a row and say “one, two, three, . . . , sixty,” as we 
might if we were counting in real life. What we do instead 
is come up with a reason for the number of rotational 
symmetries being 5 x 12. At the end of the process, we 
understand more about those symmetries than merely 
how many there are. Indeed, it is possible to go further 
and show that the group of rotations of the icosahedron 
is As, the alternating group [IH. 70] on five elements. 

6.1 Exact Counting 

Here is a more sophisticated counting problem. A one- 
dimensional random walk of n steps is a sequence 

of integers ao, ai , «2 a n , such that for each i the 

difference at - a*_i is either 1 or -1. For example, 
0,1, 2,1, 2,1, 0,-1 is a seven-step random walk. The 
number of n-step random walks that start at 0 is clearly 
2 n , since there are two choices for each step (either you 
add 1 or you subtract 1). 


Now let us try a slightly harder problem. How many 
walks of length 2n are there that start and end at 0? (We 
look at walks of length 2 n since a walk that starts and 
ends in the same place must have an even number of 
steps.) 

In order to think about this problem, it helps to use the 
letters R and L (for “right” and “left”) to denote adding 1 
and subtracting 1, respectively. This gives us an alterna- 
tive notation for random walks that start at 0: for exam- 
ple, the walk 0, 1, 2, 1, 2, 1, 0, -1 would be rewritten as 
RRLRLLL. Now a walk will end at 0 if and only if the 
number of Rs is equal to the number of Ls. Moreover, 
if we are told the set of steps where an R occurs, then 
we know the entire walk. So what we are counting is the 
number of ways of choosing n of the 2 n steps as the 
steps where an R will occur. And this is well-known to 
be (2 n)!/(n!) 2 . 

Now let us look at a related quantity that is consider- 
ably less easy to determine: the number W (n) of walks 
of length 2n that start and end at 0 and are never neg- 
ative. Here, in the notation introduced for the previous 
problem, is a list of all such walks of length 6: RRRLLL, 
RRLRLL, RRLLRL, RLRRLL, and RLRLRL. 

Now three of these five walks do not just start and end 
at 0 but visit it in the middle: RRLLRL visits it after four 
steps, RLRRLL after two, and RLRLRL after two and four. 
Suppose we have a walk of length 2n that is never neg- 
ative and visits 0 for the first time after 2 k steps. Then 
the remainder of the walk is a walk of length 2(n - k) 
that starts and ends at 0 and is never negative. There 
are W(n - k ) of these. As for the first 2 k steps of such 
a walk, they must begin with R and end with L, and in 
between must never visit 0. This means that between the 
initial R and the final L they give a walk of length 2 ( fc - 1) 
that starts and ends at 1 and is never less than 1. The 
number of such walks is clearly the same as W(k- 1). 
Therefore, since the first visit to 0 must take place after 
2 k steps for some k between 1 and n, W satisfies the 
following slightly complicated recurrence relation: 

W(n) = W(0)W(n - 1) + ■ ■ • + W(n - l)W(O). 
Here, W(0) is taken to be equal to 1. 

This allows us to calculate the first few values of 
W. We have W( 1) = W(0)W(0) = 1, which is eas- 
ier to see direetly: the only possibility is RL. Then 
W( 2) = W(1)W(0) + W(0)W(1) = 2, and W(3), which 
counts the number of such walks of length 6, equals 
W(0)W(2) + W(1)W(1) + W(2)W(0) = 5, confirming our 
earlier calculation. 

Of course, it would not be a good idea to use the recur- 
rence relation direetly if one wished to work out IV (n) 
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for large values of n such as 10 10 . However, the recur- 
rence is of a sufficiently nice form that it is amenable 
to treatment by generating functions [IV.22 §2.4], as 
is explained in enumerative and algebraic combi- 
NATORICS [IV.22 §3]. (To see the connection with that 
discussion, replace the letters R and L by the square 
brackets [ and ], respectively. A legal bracketing then 
corresponds to a walk that is never negative.) 

The argument above gives an efficient way of calcu- 
lating W (n) exactly. There are many other exact count- 
ing arguments in mathematics. Here is a small further 
sample of quantities that mathematicians know how to 
count exactly without resorting to “brute force.” (See 
the introduction to [IV.22] for a discussion of when one 
regards a counting problem as solved.) 

(i) The number r(n) of regions that a plane is cut into 
by n lines if no two of the lines are parallel and no three 
concurrent. The first four values of r(n) are 2, 4, 7, and 
ll.Itisnothardtoprovethatr(n) = r(n-l) + n, which 
leads to the formula r(n) = \n(n + 3). This statement, 
and its proof, can be generalized to higher dimensions. 

(il) The number s(n) of ways of writing n as a sum 
of four squares. Here we allow zero and negative num- 
bers and we count different orderings as different (so, 
for example, l 2 + 3 2 + 4 2 + 2 2 , 3 2 + 4 2 + l 2 + 2 2 , 
l 2 + (-3) 2 + 4 2 + 2 2 , and O 2 + l 2 + 2 2 + 5 2 are considered 
to be four different ways of writing 30 as a sum of four 
squares). It can be shown that s(n) is equal to 8 times 
the sum of all the divisors of n that are not multiples of 
4. For example, the divisors of 12 are 1, 2, 3, 4, 6, and 12, 
of which 1, 2, 3, and 6 are not multiples of 4. Therefore 
5 (12) = 8(1 + 2 + 3 + 6) = 96. The different ways are 
l 2 + l 2 + l 2 + 3 2 , O 2 + 4 2 + 4 2 + 4 2 , and the other expres- 
sions that canbe obtained from these ones by reordering 
and replacing positive integers by negative ones. 

(iii) The number of lines in space that meet a given 
four lines Li, L2, L3, and L4 when those four are in “gen- 
eral position.” (This means that they do not have special 
properties such as two of them being parallel or inter- 
secting each other.) It turns out that for any three such 
lines, there is a subset of R 3 known as a quadric sur- 
face that contains them, and that this quadric surface is 
unique. Let us take the surface for Li, L2, and L3 and call 
it S. 

The surface S has some interesting properties that 
allow us to solve the problem. The main one is that one 
can find a continuous family of lines (that is, a collec- 
tion of lines L(t), one for each real number t, that varies 
continuously with t) that, between them, make up the 
surface S and include each of the lines Li, L2, and L3. 


But there is also another such continuous family of lines 
M(s), each of which meets every line L(t) in exactly one 
point. In particular, every line M(s ) meets all of Li, L2, 
and L3, and in faet every line that meets all of Li, L2, and 
L3 must be one of the lines M(s). 

It can be shown that L4 intersects the surface S in 
exactly two points, P and Q. Now P lies in some line M(s ) 
from the second family, and Q lies in some other line 
M(s') (which must be different, or else L4 would equal 
M(s) and intersect Li, L2, and L3, contradicting the faet 
that the lines U are in general position). Therefore, the 
two lines M(s) and M(s') intersect all four of the lines 
Lj. But every line that meets all the L i has to be one of 
the lines M (5) and has to go through either P or Q (since 
the lines M(s) lie in S and L 4 meets S at only those two 
points). Therefore, the answer is 2. 

This question can be generalized very considerably, 
and answered by means of a technique known as Schu- 
bert calculus. 

(iv) The number p(n) of ways of writing a positive 
integer n as a sum of smaller positive integers. When 
n = 6 this number is 11, since 6 = 1 + 1 + 1 + 1 + l-F. 
1 = 2 + 1 + 1 + 1 + 1 = 2 + 2 + 1 + 1 = 2 + 2 + 2 = 
3 + 1 + 1 + 1 = 3 + 2 + 1 = 3 + 3 = 4+1 + 1 = 4 + 
2 = 5 + 1 = 6. The funetion p(n) is called the partition 
funetion. A remarkable formula, due to hardy [VI.72] 
and ramanujan [VI.81], gives an approximation <x(n) to 
p(n) that is so accurate that p(n) is always the nearest 
integer to a(n). 

6.2 Estimates 

Once we have seen example (ii) above, it is natural to 
ask whether it can be generalized. Is there a formula for 
the number t(n) of ways of writing n as a sum of ten 
sixth powers, for example? It is generally believed that 
the answer to this question is no, and certainly no such 
formula has been discovered. However, as with pack- 
ing problems, even if an exact answer does not seem 
to be fortheoming, it is still very interesting to obtain 
estimates. In this case, one can try to define an easily 
calculated funetion / such that f(n) is always approx- 
imately equal to t(n). If even that is too hard, one can 
try to find two easily calculated functions L and U such 
that L(n) ^ t(n) ^ U(n) for every n. If we succeed, 
then we call L a lower bound for t and U an upper bound. 
Here are a few examples of quantities that nobody knows 
how to count exactly, but for which there are interesting 
approximations, or at least interesting upper and lower 
bounds. 
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(i) Probably the most famous approximate counting 
problem in all of mathematics is to estimate n(n), the 
number of prime numbers less than or equal to n. For 
small values of n, we can of course compute Tr(n) 
exactly: for example, tt(20) = 8 since the primes less 
than or equal to 20 are 2, 3, 5, 7, 11, 13, 17, and 19. 
However, there does not seem to be a useful formula for 
7r(n), and although it is easy to think of a brute-force 
algorithm for computing n(n) — look at every number 
up to n, test whether it is prime, and keep count as you 
go along— such a procedure takes a prohibitively long 
time if n is at all large. Furthermore, it does not give us 
much insight into the nature of the function n{n). 

If, however, we modify the question slightly, and ask 
roughly how many primes there are up to n, then we 
find ourselves in the area known as analytic number 
theory [IV.4], a branch of mathematics with many fasci- 
nating results. In particular, the famous prime number 
theorem [V.33], proved by had amard [VI.64] and de la 
vallée poussin [VI.66] at the end of the nineteenth cen- 
tury, States that rr(n) is approximately equal to n/logn, 
in the sense that the ratio of n(n) to ni log n converges 
to 1 as n tends to infinity. 

This statement can be refined. It is believed that the 
“density” of primes close to n is about 1 / log n, in the 
sense that a randomly chosen integer close to n has a 
probability of about 1 / log n of being prime. This would 
suggest that n(n) should be about J" dt/logt, a func- 
tion of n that is known as the logarithmic integral of n, 
orli(n). 

How accurate is this estimate? Nobody knows, but 
the riemann hypothesis [V.33], perhaps the most 
famous unsolved problem in mathematics, is equivalent 
to the statement that n(n) and li(n) differ by at most 
cVhlogn for some constant c. Since Vhlogn is much 
smaller than n(n), this would tell us that li(n) was an 
extremely good approximation to n(n). 

(il) A self-avoiding walk of length n in the plane 
is a sequence of points {ao, bo),{ai, b\),{a 2 , bi), . . . , 
(a n , b„ ) with the following properties. 

• The numbers ai and b; are all integers. 

• For each i, one obtains {au bi) from ( <a. , _ i , b;_i ) by 
taking a horizontal or vertical step of length 1 . That 
is, either at = at - 1 and bi = bi - 1 ± 1 or a< = at- 1± 1 
and bi = bi-\. 

• No two of the points {au bi) are equal. 

The first two conditions tell us that the sequence forms 
a two-dimensional walk of length n, and the third says 
that this walk never visits any point more than once— 
hence the term “self-avoiding.” 


Let S{n) be the number of self-avoiding walks of 
length n that start at (0, 0). There is no known formula 
for S{n), and it is very unlikely that such a formula 
exists. However, quite a lot is known about the way the 
function S (n) grows as n grows. For instance, it is fairly 
easy to prove that S(n) 1,n converges to a limit c. The 
value of c is not known, but it has been shown (with the 
help of a computer) to lie between 2.62 and 2.68. 

(iii) Let C(t) be the number of points in the plane 
with integer coordinates contained in a circle of radius 
t about the origin. That is, C{t) is the number of pairs 
(a, b) of integers such that a 2 +b 2 ^ t 2 . A circle of radius 
t has area nt 2 , and the plane canbe tiledby unit squares, 
each of which has a point with integer coordinates at its 
center. Therefore, when t is large it is fairly clear (and 
not hard to prove) that C(t) is approximately nt 2 . How- 
ever, it is much less clear how good this approximation 

To make this question more precise, let us set e(t) to 
equal |C(t) - 7rt 2 |. That is, e(t) is the error in nt 2 as an 
estimate for C{t). It was shown in 1915, by Hardy and 
Landau, that e(t) must be at least c y/t for some constant 
c > 0, and this estimate, or something very similar, prob- 
ably gives the right order of magnitude for e(t). How- 
ever, the best upper bound, proved by Huxley in 1990 
(the latest in a long line of successive improvements), is 
that e{t) is at most At 46/73 for some constant A. 

6.3 Averages 

So far, our discussion of estimates and approximations 
has been confined to problems where the aim is to count 
mathematical objects of a given kind. However, that is 
by no means the only context in which estimates can 
be interesting. Given a set of objects, one may wish to 
know, besides its size, roughly what a typical one of 
those objects looks like. Many questions of this kind 
take the form of asking what the average value is of some 
numerical parameter that is associated with each object. 
Here are two examples. 

(i) What is the average distance between the starting 
point and the endpoint of a self-avoiding walk of length 
n? In this instance, the objects are self-avoiding walks of 
length n that start at (0, 0), and the numerical parameter 
is the end-to-end distance. 

Surprisingly, this is a notoriously difficult problem, 
and almost nothing is known. It is obvious that n is an 
upper bound for S{n), but one would expect a typical 
self-avoiding walk to take many twists and turns and end 
up traveling much less far than n away from its starting 
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point. However, there is no known upper bound forS(n) 
that is substantially better than n. 

In the other direction, one would expect the end- 
to-end distance of a typical self-avoiding walk to be 
greater than that of an ordinary walk, to give it room 
to avoid itself. This would suggest that S ( n ) is signifi- 
cantly greater than ,,/n, but it has not even been proved 
that it is greater. 

This is not the whole story, however, and the problem 
will be discussed further in section 8. 

(ii) Let n be a large randomly chosen positive integer 
and let c o(n) be the number of distinct prime factors of 
n. On average, how large will u>(n) be? As it stands, this 
question does not quite make sense because there are 
infinitely many positive integers, so one cannot choose 
one randomly. However, one can make the question pre- 
cise by specifying a large integer m and choosing a ran- 
dom integer n between m and 2 m. It then turns out that 
the average size of æ(n) is around log log n. 

In faet, mueh more is known than this. If all you know 
about a random variable [HI. 73 §4] is its average, then 
a great deal of its behavior is not determined, so for 
many problems calculating averages is just the begin- 
ning of the story. In this case, Hardy and Ramanujan 
gave an estimate for the standard deviation [III.73 §4] 
of co (n) , showing that it is about V log log n. Then Erdos 
and Kac went even further and gave a precise esti- 
mate for the probability that co (n) differs from log log n 
by more than c^/^og^ogn, proving the surprising faet 
that the distribution of co is approximately gaussian 
[III.73§5]. 

To put these results in perspective, let us think about 
the range of possible values of c o(n). At one extreme, 
n might be a prime itself, in which case it obviously has 
just one prime factor. At the other extreme, we can write 
the primes in ascending order as p\,pi, ps , . . . and take 
numbers of the form n = P1P2 ■ ■ ■ Pk- With the help 
of the prime number theorem, one can show that the 
order of magnitude of k is log ni log log n, which is mueh 
bigger than log log n. However, the results above tell us 
that such numbers are exceptional: a typical number has 
a few distinct prime factors, but nothing like as many as 
log m /log log m. 

6.4 Extremal Problems 

There are many problems in mathematics where one 
wishes to maxrmize or minimize some quantity in 
the presence of various constraints. These are called 
extremal problems. As with counting questions, there are 
some extremal problems for which one can realistically 
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hope to work out the answer exaetly, and many more for 
which, even though an exact answer is out of the ques- 
tion, one can still aim to find interesting estimates. Here 
are some examples of both kinds. 

(i) Let n be a positive integer and let X be a set with n 
elements. How many subsets of X canbe chosen if none 
of these subsets is contained in any other? 

A simple observation one can make is that if two dif- 
ferent sets have the same size, then neither is contained 
in the other. Therefore, one way of satisfying the con- 
straints of the problem is to choose all the sets of some 
particular size k. Now the number of subsets of X of 
size k is n!/fc!(n - fe)!, which is usually written ("] (or 
n Cfc), and the value of fe for which (”j is largest is easily 
shown to be n/2 if n is even and (n + l)/2 if n is odd. 
For simplicity let us concentrate on the case when n is 
even. What we have just proved is that it is possible to 
pick ( n " 2 ) subsets of an n-element set in such a way that 
none of them contains any other. That is, ( n ” 2 ) is a lower 
bound for the problem. A result known as Spemer’s the- 
orem States that it is an upper bound as well. That is, 
if you choose more than ( n ” 2 ) subsets of X, then, how- 
ever you do it, one of these subsets will be contained 
in another. Therefore, the question is answered exaetly, 
and the answer is ( n " 2 ) . (When n is odd, then the answer 
is (tn+i) 12 ) ' as one migbt now expect.) 

(ii) Suppose that the two ends of a heavy chain are 
attached to two hooks on the ceiling and that the chain 
is not supported anywhere else. What shape will the 
hanging chain take? 

At first, this question does not look like a maximiza- 
tion or minimization problem, but it can be quickly 
turned into one. That is because a general principle from 
physics tells us that the chain will settie in the shape that 
minimizes its potential energy. We therefore find our- 
selves asking a new question: let A and B be two points 
at distance d apart, and let C be the set of all curves of 
length l that have A and B as their two endpoints. Which 
curve C g e has the smallest potential energy? Here one 
takes the mass of any portion of the curve to be pro- 
portional to its length. The potential energy of the curve 
is equal to mgh, where m is the mass of the curve, g 
is the gravitational constant, and h is the height of the 
center of gravity of the curve. Since m and g do not 
change, another formulation of the question is: which 
curve C g C has the smallest average height? 

This problem can be solved by means of a technique 
known as the calculus of variations. Very roughly, the 
idea is this. We have a set, C, and a funetion h defined 
on C that takes each curve C g C to its average height. We 
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are trying to minimize h, and a natural way to approach 
that task is to deflne some sort of derivative and look 
for a curve C at which this derivative is 0. Notice that 
the word “derivative” here does not refer to the rate of 
change of height as you move along the curve. Rather, 
it means the (linear) way that the average height of the 
entire curve changes in response to small perturbations 
of the curve. Using this kind of derivative to find a min- 
imum is more complicated than looking for the sta- 
tionary points of a function defined on IR, since C is 
an infmite-dimensional set and is therefore much more 
complicated than R. However, the approach can be made 
to work, and the curve that minimizes the average height 
is known. (It is called a catenary , after the Latin word for 
chain.) Thus, this is another minimization problem that 
has been answered exactly. 

For a typical problem in the calculus of variations, 
one is trying to find a curve, or surface, or more gen- 
eral kind of function, for which a certain quantity is 
minimized or maximized. If a minimum or maximum 
exists (which is by no means automatic when one is 
working with an infinite-dimensional set, so this can be 
an interesting and important question), the object that 
achieves it satisfies a system of partial differential 
equations [1.3 §5.4] known as the Euler-Lagrange equa- 
tions. For more about this style of minimization or max- 
imization, see variational methods [III.94] (and also 

OPTIMIZATION AND LAGRANGE MULTIPLIERS [III.66]). 

(iii) How many numbers can you choose between 1 and 
n if no three of them are allowed to lie in an arithmetic 
progression? If n = 9 then the answer is 5. To see this, 
note first that no three of the five numbers 1, 2, 4, 8, 9 lie 
in an arithmetic progression. Now let us see if we can 
find six numbers that work. 

If we make one of our numbers 5, then we must leave 
out either 4 or 6, or else we would have the progression 
4, 5, 6. Similarly, we must leave out one of 3 and 7, one 
of 2 and 8, and one of 1 and 9. But then we have left out 
four numbers. It follows that we cannot choose 5 as one 
of the numbers. 

We must leave out one of 1, 2, and 3, and one of 7, 8, 
and 9, so if we leave out 5 then we must include 4 and 
6. But then we cannot include 2 or 8. But we must also 
leave out at least one of 1, 4, and 7, so we are forced to 
leave out at least four numbers. 

An ugly case-by-case argument of this kind is feasi- 
ble when n = 9, but as soon as n is at all large there 
are far too many cases for it to be possible to consider 
them all. For this problem, there does not seem to be a 
tidy answer that tells us exactly which is the largest set 
of integers between 1 and n that contains no arithmetic 


progression of length 3. So instead one looks for upper 
and lower bounds on its size. To prove a lower bound, 
one must find a good way of constructing a large set 
that does not contain any arithmetic progressions, and 
to prove an upper bound one must show that any set 
of a certain size must necessarily contain an arithmetic 
progression. The hest bounds to date are very far apart. 
In 1947, Behrend found a set of size n/e c V logn that con- 
tains no arithmetic progression, and in 1999 Jean Bour- 
gain proved that every set of size Cn~J\og log n/ log n 
contains an arithmetic progression. (If it is not obvious 
to you that these numbers are far apart, then consider 
what happens when n = 10 100 , say. Then ev ,|o s n is about 
4 000 000, while V log n/ log log n is about 6.5.) 

(iv) Theoretical computer science provides many min- 
imization problems: if one is programming a computer 
to perform a certain task, then one wants it to do so in as 
short a time as possible. Here is an elementary-sounding 
example: how many steps are needed to multiply two 
n-digit numbers together? 

Even if one is not too precise about what is meant by 
a “step,” one can see that the traditional method, long 
multiplication, takes at least n 2 steps since, during the 
course of the calculation, each digit of the first number is 
multiplied by each digit of the second. One might imag- 
ine that this was necessary, but in faet there are elever 
ways of transforming the problem and dramatically 
reducing the time that a computer needs to perform a 
multiplication of this kind. The fastest known method 
uses the fast fourier transform [III.26] to reduce the 
number of steps from n 2 to Cn log n log log n. Since the 
logarithm of a number is much smaller than the number 
itself, one thinks of Cn log n log log n as being only just 
worse than a bound of the form Cn. Bounds of this form 
are called linear, and for a problem like this are clearly 
the hest one can hope for, since it takes 2 n steps even 
to read the digits of the two numbers. 

Another question that is similar in spirit is whether 
there are fast algorithms for matrix multiplication. To 
multiply two n x n matrices using the obvious method 
one needs to do n 3 individual multiplications of the 
numbers in the matrices, but once again there are less 
obvious methods that do better. The main breakthrough 
on this problem was due to Strassen, who had the idea 
of splitting each matrix into four n/2 x n/2 matrices 
and multiplying those together. At first it seems as 
though one has to calculate the products of eight pairs 
of n/2 x n/2 matrices, but these products are related, 
and Strassen came up with seven such calculations from 
which the eight products could quickly be derived. One 
can then apply recursion : that is, use the same idea to 
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speed up the calculation of the seven n/2 x n/2 matrix 
Products, and so on. 

Strassen’s algorithm reduces the number of numeri- 
cal multiplications from about n 3 to about n log2 7 . Since 
log 2 7 is less than 2.81, this is a significant improve- 
ment, but only when n is large. His basic divide-and- 
conquer strategy has been developed further, and the 
current record is better than n 2A . In the other direc- 
tion, the situation is less satisfactory: nobody has found 
a proof that one needs to use significantly more than n 2 
multiplications. 

For more problems of a similar kind, see computa- 
TIONAL COMPLEXITY [IV.21] and THE MATHEMATICS OF 
ALGORITHM DESIGN [VII.5]. 

(v) Some minimization and maximization problems 
are of a more subtle kind. For example, suppose that 
one is trying to understand the nature of the differences 
between successive primes. The smallest such difference 
is 1 (the difference between 2 and 3), and it is not hard to 
prove that there is no largest difference (given any inte- 
ger n greater than 1, none of the numbers between n! + 2 
and n\ + n is a prime). Therefore, there do not seem to 
be interesting maximization or minimization problems 
concerning these differences. 

However, one can in faet formulate some fascinating 
problems if one first normalizes in an appropriate way. 
As was mentioned earlier in this section, the prime num- 
ber theorem States that the density of primes near n is 
about 1/logn, so an average gap between two primes 
near n will be about log n. If p and q are successive 
primes, we can therefore define a “normalized gap” to 
be (q - p)/logp. The average value of this normalized 
gap will be 1, but is it sometimes mueh smaller and 
sometimes mueh bigger? 

It was shown by Westzynthius in 1931 that even nor- 
malized gaps can be arbitrarily large, and it was widely 
believed that they could also be arbitrarily close to 
zero. (The famous twin-prime conjecture— that there 
are infinitely many primes p for which p + 2 is also 
a prime— implies this immediately.) However, it took 
until 2005 for this to be proved, by Goldston, Pintz, and 
Yildinm. (See analytic number theory [IV.4§§6-8] 
for a discussion of this problem.) 

7 Determining Whether Different 
Mathematical Properties Are Compatible 

In order to understand a mathematical concept, such 
as that of a group or a manifold, there are various 
stages one typically goes through. Obviously it is a good 


idea to begin by becoming familiar with a few repre- 
sentative examples of the structure, and also with tech- 
niques for budding new examples out of old ones. It is 
also extremely important to understand the homomor- 
phisms, or “structure-preserving fimetions,” from one 
example of the structure to another, as was discussed 

in SOME FUNDAMENTAL MATHEMATICAL DEFINITIONS 

[1.3 §§4.1, 4.2], 

Once one knows these basics, what is there left to 
understand? Well, for a general theory to be useful, it 
should tell us something about specific examples. For 
distance, as we saw in section 3.2, Lagrange's theorem 
can be used to prove Fermat’s little theorem. Lagrange's 
theorem is a general faet about groups: that if G is a 
group of size n, then the size of any subgroup of G must 
be a factor of n. To obtain Fermat’s little theorem, one 
applies Lagrange’s theorem to the particular case when 
G is the multiplicative group of integers mod p. The 
conclusion one obtains— that a p is always congruent to 
a— is far from obvious. 

However, what if we want to know something about a 
group G that might not be true for all groups? That is, 
suppose that we wish to determine whether G has some 
property P that some groups have and others do not. 
Since we cannot prove that the property P follows from 
the group axioms, it might seem that we are forced to 
abandon the general theory of groups and look at the 
specific group G. However, in many situations there is 
an intermediate possibility: to identify some fairly gen- 
eral property Q that the group G has, and show that Q 
implies the more particular property P that interests us. 

Here is an illustration of this sort of technique in a dif- 
ferent context. Suppose we wish to determine whether 
the polynomial p(x) = x 4 - 2x 3 - x 2 - 2x + 1 has a 
real root. One method would be to study this particu- 
lar polynomial and try to find a root. After quite a lot 
of effort we might discover that p(x) canbe factorized 
as (x 2 + x + l)(x 2 - 3x + 1). The first factor is always 
positive, but if we apply the quadratic formula to the 
second, we find that p(x) = 0 when x = (3 ± V5)/2. An 
alternative method, which uses a bit of general theory, is 
to notice that p(l) is negative (in faet, it equals -3) and 
that p(x) is large when x is large (because then the x 4 
term is far bigger than anything else), and then to use the 
intermediate value theorem, the result that any contin- 
uous funetion that is negative somewhere and positive 
somewhere else must be zero somewhere in between. 

Notice that, with the second approach, there was still 
some computation to do— finding a value of x for which 
p(x) is negative— but that it was mueh easier than the 
computation in the first approach— finding a value of 
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x for which p(x) is zero. In the second approach, we 
estabhshed that p had the rather general property of 
being negative somewhere, and used the intermediate 
value theorem to finish off the argument. 

There are many situations like this throughout math- 
ematics, and as they arise certain general properties 
become estabhshed as particularly useful. For exam- 
ple, if you know that a positive integer n is prime, or 
that a group G is Abehan (that is, gh = hg for any 
two elements g and h of G), or that a function tak- 
ing complex numbers to complex numbers is holomor- 
phic [1.3 §5.6], then as a consequence of these general 
properties you know a lot more about the objects in 
question. 

Once properties have estabhshed themselves as 
important, they give rise to a large class of mathemati- 
cal questions of the following form: given a mathemat- 
ical structure and a selection of interesting properties 
that it might have, which combinations of these prop- 
erties imply which other ones? Not all such questions 
are interesting, of course — many of them turn out to 
be quite easy and others are too artificial— but some of 
them are very natural and surprisingly resistant to one’s 
initial attempts to solve them. This is usually a sign that 
one has stumbled on what mathematicians would call a 
“deep” question. In the rest of this section let us look at 
a problem of this kind. 

A group G is called finitely generated if there is some 
finite set {xi,X2, . . . ,Xk} of elements of G such that all 
the rest can be written as products of elements in that 
set. For example, the group SL2 (Z) consists of all 2 x 2 
matrices ( a c \) such that a, b, c, and d are integers and 
ad-bc = 1. This group is finitely generated: it is a nice 
exercise to show that every such matrix canbe built from 
the four matrices ( J } ), ( J ), ( } ? ), and ( ii ? ) using 
matrix multiplication. (See [1.3 §4.2] for a discussion of 
matrices. A first step toward proving this result is to 
show that ( o T H o 1) = (o m i n )-) 

Now let us consider a second property. If x is an ele- 
ment of a group G, then x is said to have finite order if 
there is some power of x that equals the identity. The 
smallest such power is called the order of x. For exam- 
ple, in the multiplicative group of integers mod 7, the 
identity is 1, and the order of the element 4 is 3, because 
4 1 = 4, 4 2 = 16 = 2 and 4 3 = 64 = 1 mod 7. As for 3, 
its first six powers are 3, 2, 6, 4, 5, 1, so it has order 
6. Now some groups have the very special property that 
there is some integer n such that x n equals the identity 
for every x— or, equivalently, the order of every x is a 
factor of n. What can we say about such groups? 


Let us look first at the case where all elements have 
order 2. Writing e for the identity element, we are assum- 
ing that a 2 = e for every element a. If we multiply both 
sides of this equation by the inverse ar 1 , then we deduce 
that a = a - 1 . The opposite implication is equally easy, 
so such groups are ones where every element is its own 
inverse. 

Now let a and b be two elements of G. For any two 
elements a and b of any group we have the identity 
(ab) -1 = b -1 a -1 (simply because abb -1 a -1 = aa -1 = 
e), and in our special group where all elements equal 
their inverses we can deduce from this that ab = ba. 
That is, G is automatically Abelian. 

Already we have shown that one general property, 
that every element of G squares to the identity, implies 
another, that G is Abelian. Now let us add the condi- 
tion that G is finitely generated, and let xi,X2,...,xt 
be a minimal set of generators. That is, suppose that 
every element of G can be built up out of the x; and 
that we need all of the Xi to be able to do this. Because 
G is Abelian and because every element is equal to its 
own inverse, we can rearrange products of the Xi into 
a standard form, where each X; occurs at most once 
and the indices increase. For example, take the product 
X4X3X1X4X4X1X3X1X5. Because G is Abelian, this equals 
X1X1X1X3X3X4X4X4X5, and because each element is its 
own inverse this equals X1X4X5, the standard form of 
the original expression. 

This shows that G can have at most 2 k elements, since 
for each X; we have the choice of whether or not to 
include it in the product (after it has been put in the form 
above). In particular, the properties “G is finitely gener- 
ated” and “every nonidentity element of G has order 2” 
imply the third property “G is finite.” It turns out to be 
fairly easy to prove that two elements whose standard 
forms are different are themselves different, so in faet G 
has exaetly 2 k elements (where k is the size of a minimal 
set of generators). 

Now let us ask what happens if n is some integer 
greater than 2 and x n = e for every element x. That is, 
if G is finitely generated and x n = e for every x, must G 
be finite? This turns out to be a mueh harder question, 
originally asked by burnside [VI.59]. Burnside himself 
showed that G must be finite if n = 3, but it was not 
until 1968 that his problem was solved, when Adian and 
Novikov proved the remarkable result that if n ^ 4381 
then G does not have to be finite. There is of course a big 
gap between 3 and 4381, and progress in bridging it has 
been slow. It was only in 1992 that this was improved 
to n ^ 13, by Ivanov. And to give an idea of how hard 
the Burnside problem is, it is still not known whether a 
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group with two generators such that the flfth power of 
every element is the identity must be finite. 

8 Working with Arguments that 
Are Not Fully Rigorous 

A mathematical statement is considered to be estab- 
lished when it has a proof that meets the high stan- 
dards of rigor that are characteristic of the subject. How- 
ever, nonrigorous arguments have an important place 
in mathematics as well. For example, if one wishes to 
apply a mathematical statement to another held, such as 
physics or engineering, then the truth of the statement 
is often more important than whether one has proved it. 

However, this raises an obvious question: if one has 
not proved a statement, then what grounds could there 
be for believing it? There are in faet several different 
kinds of nonrigorous justification, so let us look at some 
of them. 

8.1 Conditional Results 

As was mentioned earlier in this article, the Riemann 
hypothesis is the most famous unsolved problem in 
mathematics. Why is it considered so important? Why, 
for example, is it considered more important than the 
twin-prime conjecture, another problem to do with the 
behavior of the sequence of primes? 

The main reason, though not the only one, is that it 
and its generalizations have a huge number of interest- 
ing consequences. In broad terms, the Riemann hypoth- 
esis tells us that the appearance of a certain degree of 
“randomness" in the sequence of primes is not mislead- 
ing: in many respects, the primes really do behave like 
an appropriately chosen random set of integers. 

If the primes behave in a random way, then one might 
imagine that they would be hard to analyze, but in faet 
randomness can be an advantage. For example, it is ran- 
domness that allows me to be confident that at least one 
giri was born in London on every day of the twentieth 
century. If the sex of babies were less random, I would 
be less sure: there could be some strange pattern such as 
giris being born on Mondays to Thursdays and boys on 
Fridays to Sundays. Similarly, if I know that the primes 
behave like a random sequence, then I know a great deal 
about their average behavior in the long term. The Rie- 
mann hypothesis and its generalizations formulate in a 
precise way the idea that the primes, and other impor- 
tant sequences that arise in number theory, “behave ran- 
domly.” That is why they have so many consequences. 
There are large numbers of papers with theorems that 
are proved only under the assumption of some version 


of the Riemann hypothesis. Therefore, anybody who 
proves the Riemann hypothesis will change the status 
of all these theorems from conditional to fully proved. 

How should one regard a proof if it relies on the Rie- 
mann hypothesis? One could simply say that the proof 
establishes that such and such a result is implied by 
the Riemann hypothesis and leave it at that. But most 
mathematicians take a different attitude. They believe 
the Riemann hypothesis, and believe that it will one day 
be proved. So they believe all its consequences as well, 
even if they feel more secure about results that can be 
proved unconditionally. 

Another example of a statement that is generally 
believed and used as a foundation for a great deal of fur- 
ther research comes from theoretical computer science. 
As was mentioned in section 6.4 (iv), one of the main 
aims of computer science is to establish how quickly 
certain tasks can be performed by a computer. This aim 
splits into two parts: finding algorithms that work in as 
few steps as possible, and proving that every algorithm 
must take at least some particular number of steps. The 
second of these tasks is notoriously difficult: the best 
results known are far weaker than what is believed to be 

There is, however, a class of computational problems, 
called NP-complete problems, that are known to be of 
equivalent difficulty. That is, an efficient algorithm for 
one of these problems canbe converted into an efficient 
algorithm for any other. Furthermore, it is almost univer- 
sally believed that there is in faet no efficient algorithm 
for any of the problems, or, as it is usually expressed, 
that “P does not equal NP.” Therefore, if you want to 
demonstrate that no quick algorithm exists for some 
problem, all you have to do is prove that it is at least 
as hard as some problem that is already known to be 
NP-complete. This will not be a rigorous proof, but it 
will be a convincing demonstration, since most mathe- 
maticians are convinced that P does not equal NP. (See 
computational complexity [IV.21] for mueh more on 
this topic.) 

Some areas of research depend on several conjectures 
rather than just one. It is as though researchers in such 
areas have discovered a beautiful mathematical land- 
scape and are impatient to map it out despite the faet 
that there is a great deal that they do not understand. 
And this is often a very good research strategy, even 
from the perspective of finding rigorous proofs. There 
is far more to a conjecture than simply a wild guess: 
for it to be accepted as important, it should have been 
subjected to tests of many kinds. For example, does it 
have consequences that are already known to be true? 
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Are there special cases that one can prove? If it were 
true, would it help one solve other problems? Is it sup- 
ported by numerical evidence? Does it make a bold, pre- 
cise statement that would probably be easy to refute if 
it were false? It requires great insight and hard work to 
produce a conjecture that passes all these tests, but if 
one succeeds, one has not just an isolated statement, but 
a statement with numerous connections to other state- 
ments. This increases the chances that it will be proved, 
and greatly increases the chances that the proof of one 
statement will lead to proofs of others as well. Even a 
counterexample to a good conjecture can be extraor- 
dinarily reveahng: if the conjecture is related to many 
other statements, then the effects of the counterexample 
will permeate the whole area. 

One area that is full of conjectural statements is alge- 
braic number theory [IV.3]. In particular, the Lang- 
lands program is a collection of conjectures, due to 
Robert Langlands, that relate number theory to rep- 
resentation theory (it is discussed in representation 
theory [IV.12 §6]). Between them, these conjectures 
generalize, unify, and explain large numbers of other 
conjectures and results, for example, the Shimura- 
Taniyama-Weil conjecture, which was central to Andrew 
Wiles’s proof of fermat’s last theorem [V.12], forms 
one small part of the Langlands program. The Lang- 
lands program passes the tests for a good conjecture 
supremely well, and has for many years guided the 
research of a large number of mathematicians. 

Another area of a similar nature is known as mirror 
symmetry [IV.14]. This is a sort of duality [III. 19] that 
relates objects known as calabi-yau manifolds [111.6], 
which arise in algebraic geometry [IV. 7] and also in 
string theory [IV.13 §2], to other, dual manifolds. Just 
as certain differential equations canbecome much easier 
to solve if one looks at the fourier transforms [III.27] 
of the functions in question, so there are calculations 
arising in string theory that look impossible until one 
transforms them into equivalent calculations in the dual, 
or “mirror,” situation. There is at present no rigorous 
justification for the transformation, but this process has 
led to complicated formulas that nobody could possibly 
have guessed, and some of these formulas have been 
rigorously proved in other ways. Maxim Kontsevich has 
proposed a precise conjecture that would explain the 
apparent successes of mirror symmetry. 

8.2 Numerical Evidence 

The goldbach conjecture [V.30] States that every even 
number greater than or equal to 4 is the sum of two 


primes. It seems to be well beyond what anybody could 
hope to prove with today's mathematical machinery, 
even if one is prepared to accept statements such as the 
Riemann hypothesis. And yet it is regarded as almost 
certainly true. 

There are two principal reasons for believing Gold- 
bach’s conjecture. The first is a reason we have already 
met: one would expect it to be true if the primes are “ran- 
domly distributed.” This is because if n is a large even 
number, then there are many ways of writing n = a + b, 
and there are enough primes for one to expect that from 
time to time both a and b would be prime. 

Such an argument leaves open the possibility that for 
some value of n that is not too large one might be 
unlucky, and it might just happen that n - a was com- 
posite whenever a was prime. This is where numerical 
evidence comes in. It has now been checked that every 
even number up to 10 14 can be written as a sum of 
two primes, and once n is greater than this, it becomes 
extremely unlikely that it could “just happen,” by a fluke, 
to be a counterexample. 

This is perhaps rather a crude argument, but there is 
a way to make it even more convincing. If one makes 
more precise the idea that the primes appear to be ran- 
domly distributed, one can formulate a stronger version 
of Goldbach's conjecture that says not only that every 
even number can be written as a sum or two primes, but 
also roughly how many ways there are of doing this. For 
instance, if a and n - a are both prime, then neither is a 
multiple of 3 (unless they are equal to 3 itself). If n is a 
multiple of 3, then this merely says that a is not a multi- 
ple of 3, but if n is of the form 3 m + 1 then a cannot be 
of the form 3fc + 1 either (or n - a would be a multiple 
of 3). So, in a certain sense, it is twice as easy for n to 
be a sum of two primes if it is a multiple of 3. Taking 
this kind of information into account, one can estimate 
in how many ways it “ought” to be possible to write n as 
a sum of two primes. It turns out that, for every even n, 
there should be many such representations. Moreover, 
one’s predictions of how many are closely matched by 
the numerical evidence: that is, they are true for values 
of n that are small enough to be checked on a computer. 
This makes the numerical evidence much more convinc- 
ing, since it is evidence not just for Goldbach’s conjec- 
ture itself, but also for the more general principles that 
led us to believe it. 

This illustrates a general phenomenon: the more pre- 
cise the predictions that follow from a conjecture, the 
more impressive it is when they are confirmed by later 
numerical evidence. Of course, this is true not just of 
mathematics but of science more generally. 
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8.3 “Illegal” Calculations 

In section 6.3 it was stated that “almost nothing is 
known” about the average end-to-end distance of an n- 
step self-avoiding walk. That is a statement with which 
theoretical physicists would strongly disagree. Instead, 
they would tell you that the end-to-end distance of a 
typical n-step self-avoiding walk is somewhere in the 
region of n 3/4 . This apparent disagreement is explained 
by the faet that, although almost nothing has been rig- 
orously proved, physicists have a collection of nonrig- 
orous methods that, if used carefully, seem to give cor- 
rect results. With their methods, they have in some areas 
managed to estabhsh statements that go well beyond 
what mathematicians can prove. Such results are fasci- 
nating to mathematicians, partly because if one regards 
the results of physicists as mathematical conjectures 
then many of them are excellent conjectures, by the 
standards explained earlier: they are deep, completely 
unguessable in advance, widely believed to be true, 
backed up by numerical evidence, and so on. Another 
reason for their fascination is that the effort to pro- 
vide them with a rigorous underpinning often leads to 
significant advances in pure mathematics. 

To give an idea of what the nonrigorous calculations 
of physicists can be like, here is a rough description of a 
famous argument of Pierre-Gilles de Gennes, which lies 
behind some of the results (or predictions, if you pre- 
fer to call them that) of physicists. In statistical physics 
there is a model known as the n-vector model, closely 
related to the Ising and Potts models described in prob- 
ABILISTIC MODELS OF CRITICAL PHENOMENA [IV.26]. At 
each point of one places a unit vector in IR". This 
gives rise to a random configuration of unit vectors, with 
which one associates an “energy” that inereases as the 
angles between neighboring vectors inerease. De Gennes 
f ound a way of transf orming the self-avoiding walk prob- 
lem so that it could be regarded as a question about the 
n-vector model in the case n = 0. The 0-vector prob- 
lem itself does not make obvious sense, since there is 
no such thing as a unit vector in R°, but de Gennes was 
nevertheless able to take parameters associated with the 
n-vector model and show that if you let n converge to 
zero then you obtained parameters associated with self- 
avoiding walks. He proceeded to choose other parame- 
ters in the n-vector model to derive information about 
self-avoiding walks, such as the expected end-to-end 
distance. 

To a pure mathematician, there is some thing very wor- 
rying about this approach. The formulas that arise in 
the n-vector model do not make sense when n = 0, so 


instead one has to regard them as limiting values when 
n tends to zero. But n is very clearly a positive integer in 
the n-vector model, so how can one say that it tends to 
zero? Is there some way of defining an n-vector model 
for more general n? Perhaps, but nobody has found one. 
And yet de Gennes's argument, like many other argu- 
ments of a similar kind, leads to remarkably precise pre- 
dictions that agree with numerical evidence. There must 
be a good reason for this, even if we do not understand 
what it is. 

The examples in this section are just a few illus- 
trations of how mathematics is enriched by nonrigor- 
ous arguments. Such arguments allow one to penetrate 
mueh further into the mathematical unknown, open- 
ing up whole areas of research into phenomena that 
would otherwise have gone unnoticed. Given this, one 
might wonder whether rigor is important: if the results 
established by nonrigorous arguments are clearly true, 
then is that not good enough? As it happens, there are 
examples of statements that were “established” by non- 
rigorous methods and later shown to be false, but the 
most important reason for caring about rigor is that the 
understanding one garns from a rigorous proof is fre- 
quently deeper than the understanding provided by a 
nonrigorous one. The hest way to describe the situation 
is perhaps to say that the two styles of argument have 
profoundly benefited each other and will undoubtedly 
continue to do so. 

9 Finding Explicit Proofs and Algorithms 

There is no doubt that the equation x 3 - x - 1 3 = 0 has 
a solution. After all, if we set fix) = x 5 - x - 1 3, then 
/( 1) = -13 and /( 2) = 17, so somewhere between 1 
and 2 there will be an x for which f(x) = 0. 

That is an example of a pure existence argument— in 
other words, an argument that establishes that some- 
thing exists (in this case, a solution to a certain equa- 
tion), without telling us how to find it. If the equation 
had been x 2 - x - 13 = 0, then we could have used 
an argument of a very different sort: the formula for 
quadratic equations tells us that there are precisely two 
solutions, and it even tells us what they are (they are 
(1 + -/55) /2 and (1 - -j53)/2). However, there isno sim- 
ilar formula for quintic equations (see the insolubility 
OF THE QUINTIC [V.24]). 

These two arguments illustrate a fundamental dichot- 
omy in mathematics. If you are proving that a math- 
ematical object exists, then sometimes you can do so 
explicitly, by actually describing that object, and some- 
times you can do so only indirectly, by showing that its 
nonexistence would lead to a contradiction. 
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There is also a Spectrum of possibilities in between. 
As it was presented, the argument above showed merely 
that the equation x 5 - x - 13 has a solution between 
1 and 2, but it also suggests a method for calculating 
that solution to any desired accuracy. If, for example, 
you want to know it to two decimal places, then run 
through the numbers 1, 1.01, 1.02, . . . , 1.99, 2 evaluating 
/ at each one. You will find that /(1.96) is approxi- 
mately -0.202 and /(1.97) is approximately 0.0914, so 
there must be a solution between the two (which the cal- 
culations suggest will be doser to 1.97 than to 1.96). 
And in faet there are mueh better ways, such as new- 
ton’s method [II.4 §2.3], of approximating solutions. 
For many purposes, a pretty formula for a solution is 
less important than a method of calculating or approxi- 
mating it. (See Numerical analysis [IV.20 §1] for a fur- 
ther discussion of this point.) And if one has a method, 
its usefulness depends very mueh on whether it works 
quickly. 

Thus, at one end of the Spectrum one has simple for- 
mulas that define mathematical objects and can easily 
be used to find them, at the other one has proofs that 
estabhsh existence but give no further information, and 
in between one has proofs that yield algorithms for Ånd- 
ing the objects, algorithms that are signihcantly more 
useful if they run quickly. 

Just as, all else being equal, a rigorous argument is 
preferable to a nonrigorous one, so an expheit or algo- 
rithmic argument is worth looking for even if an indirect 
one is already established, and for similar reasons: the 
effort to And an expheit argument very often leads to 
new mathematical insights. (Less obviously, as we shall 
soon see, Ånding indirect arguments can also lead to new 
insights.) 

One of the most famous examples of a pure exis- 
tence argument concerns transcendental numbers 
[ffl.43], which are real numbers that are not roots of any 
polynomial with integer coefAcients. The Arst person to 
prove that such numbers existed was liouville [VI.38], 
in 1844. He proved that a certain condition was sufA- 
cient to guarantee that a number was transcendental 
and demonstrated that it is easy to construct numbers 
satisfying his condition (see liouville’s theorem and 
roth’s theorem [V.25]). After that, various important 
numbers such as e and tt were proved to be transcenden- 
tal, but these proofs were difAcult. Even now there are 
many numbers that are almost certainly transcenden- 
tal but which have not been proved to be transcenden- 
tal. (See irrational and transcendental numbers 
[AI.43] for more information about this.) 


All the proofs mentioned above were direct and 
explicit. Then in 1873 cantor [VI.53] provided a com- 
pletely different proof of the existence of transcenden- 
tal numbers, using his theory of countability [III.ll]. 
He proved that the algebraic numbers were countable 
and the real numbers uncountable. Since countable sets 
are far smaller than uncountable sets, this showed that 
almost every real number (though not necessarily almost 
every real number you will actually meet) is transcenden- 
tal. 

In this instance, each of the two arguments tells us 
something that the other does not. Cantor’s proof shows 
that there are transcendental numbers, but it does not 
provide us with a single example. (Strictly speaking, this 
is not true: one could specify a way of listing the alge- 
braic numbers and then apply Cantor’s famous diago- 
nal argument to that particular list. However, the result- 
ing number would be virtually devoid of meaning.) Liou- 
ville’s proof is mueh better in that way, as it gives us 
a method of constructing several transcendental num- 
bers with fairly straightforward definitions. However, 
if one knew only the expheit arguments such as Liou- 
ville’s and the proofs that e and n are transcendental, 
then one might have the impression that transcendental 
numbers are numbers of a very special kind. The insight 
that is completely missing from these arguments, but 
present in Cantor’s proof, is that a typical real number 
is transcendental. 

For mueh of the twentieth century, highly abstract 
and indirect proofs were fashionable, but in more recent 
years, especiahy with the advent of the computer, atti- 
tudes have changed. (Of course, this is a very general 
statement about the entire mathematical community 
rather than about any single mathematician.) Nowadays, 
more attention is often paid to the question of whether a 
proof is explicit, and, if so, whether it leads to an efAcient 
algorithm. 

Needless to say, algorithms are interesting in them- 
selves, and not just for the light they shed on mathe- 
matical proofs. Let us conclude this section with a brief 
description of a particularly interesting algorithm that 
has been developed by several authors over the last 
few years. It gives a way of computing the volume of 
a high-dimensional convex body. 

A shape K is called convex if, given any two points x 
and y in K, the line segment joining x to y Ues entirely 
inside K. For example, a square or a triangle is convex, 
but a Ave-pointed star is not. This concept can be gener- 
alized straightforwardly to n dimensions, for any n, as 
can the notions of area and volume. 
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Now let us suppose that an n-dimensional convex 
body K is specified for us in the following sense: we 
have a computer program that runs quickly and tells us, 
for each point (x\, . . . ,x n ), whether or not that point 
belongs to K. How can we estimate the volume of Kl 
One of the most powerful methods for problems like 
this is statistical : you choose points at random and see 
whether they belong to K, basing your estimate of the 
volume of K on the frequency with which they do. For 
example, if you wanted to estimate tt, you could take a 
circle of radius 1, enclose it in a square of side-length 
2, and choose a large number of points randomly from 
the square. Each point has a probability tt/ 4 (the ratio 
of the area tt of the circle to the area 4 of the square) 
of belonging to the circle, so we can estimate tt by tak- 
ing the proportion of points that fail in the circle and 
multiplying it by 4. 

This approach works quite easily for very low dimen- 
sions but as soon as n is at all large it runs into a 
severe difficulty. Suppose for example that we were to 
try to use the same method for estimating the vol- 
ume of an n-dimensional sphere. We would enclose that 
sphere in an n-dimensional cube, choose points at ran- 
dom in the cube, and see how often they belonged to 
the sphere as well. However, the ratio of the volume of 
an n-dimensional sphere to that of an n-dimensional 
cube that contains it is exponentially small, which means 
that the number of points you have to pick before even 
one of them lands in the sphere is exponentially large. 
Therefore, the method becomes hopelessly impractical. 

All is not lost, though, because there is a trick for get- 
ting around this difficulty. You define a sequence of con- 
vex bodies, Ko,Ki, . . . , K m , each contained in the next, 
starting with the convex body whose volume you want 
to know, and ending with the cube, in such a way that 
the volume of Ki is always at least half that of K i+ 1 . Then 
for each i you estimate the ratio of the volumes of Ki - 1 
and Ki. The product of all these ratios will be the ratio 
of the volume of Ko to that of K m . Since you know the 
volume of K m , this tells you the volume of Kq. 

How do you estimate the ratio of the volumes of kr-i 
and Ki? You simply choose points at random from Ki 
and see how many of them belong to Ki- 1. However, it 
is just here that the true subtlety of the problem arises: 
how do you choose points at random from a convexbody 
Ki that you do not know much about? Choosing a ran- 
dom point in the n-dimensional cube is easy, since all 
you need to do is independently choose n random num- 
bers xi , . . . , x n , each between - 1 and 1 . But for a general 
convex body it is not easy at all. 


There is a wonderfully elever idea that gets around 
this problem. It is to design carefully a random walk 
that starts somewhere inside the convex body and at 
each step moves to another point, chosen at random 
from just a few possibilities. The more random steps 
of this kind that are taken, the less can be said about 
where the point is, and if the walk is defined prop- 
erly, it can be shown that after not too many steps, 
the point reached is almost purely random. However, 
the proof is not at all easy. (It is discussed further in 

HIGH-DIMENSIONAL GEOMETRY AND ITS PROBABILISTIC 
ANALOGUES [IV.24 §6].) 

For further discussion of algorithms and their math- 
ematical importance, see computational number 
THEORY [IV. 5], COMPUTATIONAL COMPLEXITY [IV.21], 
and THE MATHEMATICS OF ALGORITHM DESIGN [VTI.5]. 

10 What Do You Find in 
a Mathematical Paper? 

Mathematical papers have a very distinetive style, one 
that became established early in the twentieth century. 
This final section is a description of what mathemati- 
cians actually produce when they write. 

A typical paper is usually a mixture of formal and 
informal writing. Ideally (but by no means always), the 
author writes a readable introduction that tells the 
reader what to expect from the rest of the paper. And 
if the paper is divided into sections, as most papers are 
unless they are quite short, then it is also very helpful 
to the reader if each section can begin with an informal 
outline of the arguments to follow. But the main sub- 
stance of the paper has to be more formal and detailed, 
so that readers who are prepared to make a sufficient 
effort can convince themselves that it is correct. 

The object of a typical paper is to establish mathe- 
matical statements. Sometimes this is an end in itself: 
for example, the justification for the paper may be that it 
proves a conjecture that has been open for twenty years. 
Sometimes the mathematical statements are established 
in the service of a wider aim, such as helping to explain 
a mathematical phenomenon that is poorly understood. 
But either way, mathematical statements are the main 
currency of mathematics. 

The most important of these statements are usu- 
ally called theorems, but one also finds statements 
called propositions, lemmas, and corollaries. One can- 
not always draw sharp distinetions between these kinds 
of statements, but in broad terms this is what the dif- 
ferent words mean. A theorem is a statement that you 
regard as intrinsically interesting, a statement that you 



70 


I. Introduction 


might think of isolating from the paper and telling other 
mathematicians about in a seminar, for instance. The 
statements that are the main goals of a paper are usually 
called theorems. A proposition is a bit like a theorem, but 
it tends to be slightly “boring.” It may seem odd to want 
to prove boring results, but they can be important and 
useful. What makes them boring is that they do not sur- 
prise us in any way. They are statements that we need, 
that we expect to be true, and that we do not have much 
difficulty proving. 

Here is a quick example of a statement that one might 
choose to call a proposition. The associative law for 
a binary operation [1.2 §2.4] States that x * (y * 
z) = (x*y)*z. One often describes this law informally 
by saying that “brackets do not matter.” However, while 
it shows that we can write x * y * z without fear of 
ambiguity, it does not show quite so obviously that we 
can write a*b*c*d*e, for example. How do we 
know that, just because the positions of brackets do not 
matter when you have three objects, they do not matter 
when you have more than three? 

Many mathematics students go happily through uni- 
versity without noticing that this is a problem. It just 
seems obvious that the associative law shows that brack- 
ets do not matter. And they are basically right: although 
it is not completely obvious, it is certainly not a surprise 
and turns out to be easy to prove. Since we often need 
this simple result and could hardly call it a theorem, we 
might call it a proposition instead. To get a feel for how 
to prove it, you might wish to show that the associative 
law implies that 

{a* ((b * c) * d)) * e = a* (b * ((c * d) * e)). 
Then you can try to generalize what it is you are doing. 

Often, if you are trying to prove a theorem, the proof 
becomes long and complicated, in which case if you want 
anybody to read it you need to make the structure of the 
argument as clear as possible. One of the hest ways of 
doing this is to identify subgoals, which take the form of 
statements intermediate between your initial assump- 
tions and the conclusion you wish to draw from them. 
These statements are usually called lemmas. Suppose, 
for example, that you are trying to give a very detailed 
presentation of the standard proof that y'2 is irrational. 
One of the facts you will need is that every fraction p/q 
is equal to a fraction r /s with r and s not both even, and 
this faet requires a proof. For the sake of clarity, you 
might well decide to isolate this proof from the main 
proof and call the faet a lemma. Then you have split 
your task into two separate tasks: proving the lemma, 
and proving the main theorem using the lemma. One 


can draw a parallel with computer programming: if you 
are writing a complicated program, it is good practice 
to divide your main task into subtasks and write sepa- 
rate mini-programs for them, which you can then treat 
as “black boxes,” to be called upon by other parts of the 
program whenever they are useful. 

Some lemmas are difflcult to prove and are useful in 
many different contexts, so the most important lemmas 
can be more important than the least important theo- 
rems. However, a general rule is that a result will be 
called a lemma if the main reason for proving it is in 
order to use it as a stepping stone toward the proofs of 
other results. 

A corollary of a mathematical statement is another 
statement that follows easily from it. Sometimes the 
main theorem of a paper is followed by several corollar- 
ies, which advertise the strength of the theorem. Some- 
times the main theorem itself is labeled a corollary, 
because all the work of the proof goes into proving a 
different, less punchy statement from which the theo- 
rem follows very easily. If this happens, the author may 
wish to make clear that the corollary is the main result 
of the paper, and other authors would refer to it as a 
theorem. 

A mathematical statement is established by means 
of a proof. It is a remarkable feature of mathematics 
that proofs are possible: that, for example, an argu- 
ment invented by euclid [VI. 2] over two thousand years 
ago can still be accepted today and regarded as a com- 
pletely convmcing demonstration. It took until the late 
nineteenth and early twentieth centuries for this phe- 
nomenon to be properly understood, when the language 
of mathematics was formalized (see the language and 
grammar of mathematics [1.2], and especially sec- 
tion 4, for an idea of what this means). Then it became 
possible to make precise the notion of a proof as well. 
From a logician’s point of view a proof is a sequence 
of mathematical statements, each written in a formal 
language, with the following properties: the first few 
statements are the initial assumptions, or premises', each 
remaining statement in the sequence follows from ear- 
lier ones by means of logical rules that are so simple 
that the deductions are clearly valid (for instance rules 
such as “if P a Q is true then P is true,” where “a” is the 
logical symbol for “and”); and the final statement in the 
sequence is the statement that is to be proved. 

The above idea of a proof is a considerable idealization 
of what actually appears in a normal mathematical paper 
under the heading “Proof." That is because a purely for- 
mal proof would be very long and almost impossible 
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to read. And yet, the faet that arguments can in princi- 
ple be formalized provides a very valuable underpinning 
for the ediflce of mathematics, because it gives a way 
of resolving disputes. If a mathematician produces an 
argument that is strangely unconvincing, then the hest 
way to see whether it is correct is to ask him or her to 
explain it more formally and in greater detail. This will 
usually either expose a mistake or make it clearer why 
the argument works. 

Another very important component of mathematical 
papers is definitions. This book is full of them: see in 
particular part III. Some definitions are given simply 
because they enable one to speak more concisely. For 
example, if I am proving a result about triangles and I 
keep needing to consider the distances between the ver- 
tices and the opposite sides, then it is a nuisance to have 
to say “the distances from A, B, and C to the lines BC, AC, 
and AB, respectively,” so instead I will probably choose 
a word like “altitude” and write, “Given a vertex of a tri- 
angle, define its altitude to be the distance from that 
vertex to the opposite side.” If I am looking at triangles 
with obtuse angles, then I will have to be more careful: 
“Given a vertex A of a triangle ABC, define its altitude 
to be the distance from A to the unique line that passes 
through B and C.” From then on, I can use the word “alti- 
tude” and the exposition of my proof will be mueh more 
crisp. 

Definitions like this are mere definitions of conve- 
nience. When the need arises, it is pretty obvious what 
to do and one does it. But the really interesting defini- 
tions are ones that are far from obvious and that make 
you think in new ways once you know them. A very good 
example is the definition of the derivative of a funetion. 
If you do not know this definition, you will have no idea 
how to find out for which nonnegative x the funetion 
f(x) = 2x 3 - 3x 2 - 6x+ 1 takes its smallest value. If you 
do know it, then the problem becomes a simple exercise. 
That is perhaps an exaggeration, since you also need to 
know that the minimum will occur either at 0 or at a 
point where the derivative vanishes, and you will need 
to know how to differentiate /(x), but these are simple 
facts— propositions rather than theorems— and the real 
breakthrough is the concept itself. 

There are many other examples of definitions like 
this, but interestingly they are more common in some 
branches of mathematics than in others. Some mathe- 
maticians will tell you that the main aim of their research 
is to find the right definition, after which their whole 
area will be illuminated. Yes, they will have to write 
proofs, but if the definition is the one they are look- 
ing for, then these proofs will be fairly straightforward. 


And yes, there will be problems they can solve with the 
help of the new definition, but, like the minimization 
problem above, these will not be central to the theory. 
Rather, they will demonstrate the power of the defini- 
tion. For other mathematicians, the main purpose of def- 
initions is to prove theorems, but even very theorem- 
oriented mathematicians will from time to time find 
that a good definition can have a major effeet on their 
problem-solving prowess. 

This brings us to mathematical problems. The main 
aim of an article in mathematics is usually to prove the- 
orems, but one of the reasons for reading an article is 
to advance one’s own research. It is therefore very wel- 
come if a theorem is proved by a technique that can be 
used in other contexts. It is also very welcome if an arti- 
cle contains some good unsolved problems. By way of 
illustration, let us look at a problem that most mathe- 
maticians would not take all that seriously, and try to 
see what it lacks. 

A number is called palindromic if its representation in 
base 10 is a palindrome: some simple examples are 22, 
131, and 548 845. Of these, 131 is interesting because it 
is also a prime. Let us try to find some more prime palin- 
dromic numbers. Single-digit primes are of course palin- 
dromic, and two-digit palindromic numbers are multi- 
ples of 11, so only 11 itself is also a prime. So let us 
move quickly on to three-digit numbers. Here there turn 
out tobe several examples: 101, 131, 151, 191, 313, 353, 
373, 383, 727, 757, 787, 797, 919, and 929. It is not hard 
to show that every palindromic number with an even 
number of digits is a multiple of 11, but the palindromic 
primes do not stop at 929— for example, 10 301 is the 
next smallest. 

And now anybody with a modieum of mathematical 
curiosity will ask the question: are there inhnitely many 
palindromic primes? This, it turns out, is an unsolved 
problem. It is believed (on the combined grounds that 
the primes should be sufficiently random and that palin- 
dromic numbers with an odd number of digits do not 
seem to have any particular reason to be factorizable) 
that there are, but nobody knows how to prove it. 

This problem has the great virtue of being easy to 
understand, which makes it appealing in the way that 
fermat’s last theorem [V.12] and goldbach’s con- 
jecture [V.30] are appealing. And yet, it is not a cen- 
tral problem in the way that those two are: most math- 
ematicians would put it into a mental box marked 
“recreational” and forget about it. 

What explains this dismissive attitude? Are the primes 
not central objects of study in mathematics? Well, yes 
they are, but palindromic numbers are not. And the 



72 


I. Introduction 


main reason they are not is that the definition of “palin- 
dromic” is extremely unnatural. If you know that a num- 
ber is palindromic, what you know is less a feature of 
the number itself and more a feature of the particular 
way that, for accidental historical reasons, we choose to 
represent it. In particular, the property depends on our 
choice of the number 10 as our base. For example, if we 
write 131 in base 3, then it becomes 11212, which is no 
longer the same when written backwards. By contrast, a 
prime number is prime however you write it. 

This is not quite a complete explanation, since 
there could conceivably be interesting properties that 
involved the number 10, or at least some artificial choice 
of number, in an essential way. For example, the prob- 
lem of whether there are infmitely many primes of the 
form 2 n - 1 is considered interesting, despite the use of 
the particular number 2. However, the choice of 2 canbe 
justihed here: a n - 1 has a factor a - 1, so for any larger 
integer the answer would be no. Moreover, numbers of 
the form 2 n - 1 have special properties that make them 
more likely to be prime. (See computational number 
theory [IV. 5] for an explanation of this point.) 

But even if we replace 10 by the “more natural” num- 
ber 2 and look at numbers that are palindromic when 
written in binary, we still do not obtain a property that 
would be considered a serious topic for research. Sup- 
pose that, given an integer n, we define r(n) to be the 
reverse of n — that is, the number obtained if you write 
n in binary and then reverse its digits. Then a palin- 
dromic number, in the binary sense, is a number n such 
that n = r(n). But the function r(n) is very strange 
and “unmathematical.” For instance, the reverses of the 
numbers from 1 to 20 are 1, 1, 3, 1, 5, 3, 7, 1, 9, 5, 13, 3, 
11, 7, 15, 1, 17, 9, 25, and 5, which gives us a sequence 
with no obvious pattern. Indeed, when one calculates 
this sequence, one reahzes that it is even more artifi- 
cial than it at first seemed. One might imagine that the 
reverse of the reverse of a number is the number itself, 
but that is not so. If you take the number 10, for exam- 
ple, it is 1010 in binary, so its reverse is 0101, which 
is the number 5. But this we would normally write as 
101, so the reverse of 5 is not 10 but 5. But we cannot 
solve this problem by deciding to write 5 as 0101, since 
then we would have the problem that 5 was no longer 
palindromic, when it clearly ought to be. 

Does this mean that nobody would be interested in 
a proof that there were infmitely many palindromic 
primes? Not at all. It can be shown quite easily that the 
number of palindromic numbers less than n is in the 
region of -jn, which is a very small fraction indeed. It is 
notoriously hard to prove results about primes in sparse 


sets like this, so a solution to this conjecture would be 
a big breakthrough. However, the definition of “palin- 
dromic” is so artificial that there seems to be no way of 
using it in a detailed way in a mathematical proof. The 
only realistic hope of solving this problem would be to 
prove a much more general result, of which this would 
be just one of many consequences. Such a result would 
be wonderful, and undeniably interesting, but you will 
not discover it by thinking about palindromic numbers. 
Instead, you would be better off either trying to formu- 
late a more general question, or else looking at a more 
natural problem of a similar kind. An example of the lat- 
ter is this: are there infmitely many primes of the form 
m 2 + 1 for some positive integer m? 

Perhaps the most important feature of a good prob- 
lem is generality: the solution to a good problem should 
usually have ramifications beyond the problem itself. A 
more accurate word for this desirable quality is “gen- 
eralizability,” since some excellent problems may look 
rather specific. For example, the statement that y'2 is 
irrational looks as though it is about just one number, 
but once you know how to prove it, you will have no 
difficulty in proving that y'3 is irrational as well, and 
in faet the proof can be generalized to a much wider 
class of numbers (see algebraic numbers [IV.3 §14]). 
It is quite common for a good problem to look uninter- 
esting until you start to think about it. Then you realize 
that it has been asked for a reason: it might be the “first 
difficult case” of a more general problem, or it might be 
just one well-chosen example of a cluster of problems, 
all of which appear to run up against the same difficulty. 

Sometimes a problem is just a question, but frequently 
the person who asks a mathematical question has a good 
idea of what the answer is. A conjecture is a mathemati- 
cal statement that the author firmly believes but cannot 
prove. As with problems, some conjectures are better 
than others: as we have already discussed in section 8.1, 
the very hest conjectures can have a major effeet on the 
direction of mathematical research. 
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Part II 


The Origins of 
Modern Mathematics 


II. 1 From Numbers to 
Number Systems 

Fernando Q. Gouvéa 


People have been writing numbers down for as long as 
they have been writing. In every civilization that has 
developed a way of recording information, we also find 
a way of recording numbers. Some scholars even argue 
that numbers came first. 

It is fairly clear that numbers first arose as adjectives: 
they specified how many or how much of something 
there was. Thus, it was possible to talk about three 
apricots, say, long before it was possible to talk about 
the number 3. But once the concept of “threeness” is 
on the table, so that the same adjective specifies three 
fish and three horses, and once a written symbol such 
as “3” is developed that can be used in all of those 
instances, the conditions exist for 3 itself to emerge 
as an independent entity. Once it does, we are doing 
mathematics. 

This process seems to have repeated itself many 
times when new kinds of numbers have been intro- 
duced: first a number is used, then it is represented 
symbolically, and finally it comes to be conceived as a 
thing in itself and as part of a system of similar entities. 

1 Numbers in Early Mathematics 

The earliest mathematical documents we know about 
go back to the civilizations of the ancient Middle East, 
in Egypt and in Mesopotamia. In both cultures, a scribal 
class developed. Scribes were responsible for keeping 
records, which often required them to do arithmetic 
and solve simple mathematical problems. Most of the 
mathematical documents we have from those cultures 


seem to have been created for the use of young scribes 
learning their craft. Many of them are collections of 
problems, provided with either answers or brief solu- 
tions: twenty-five problems about digging trenches in 
one tablet, twelve problems requiring the solution of 
a linear equation in another, problems about squares 
and their sides in a third. 

Numbers were used both for counting and for mea- 
suring, so a need for fractional numbers must have 
come up fairly early. Fractions are complicated to write 
down, and computing with them canbe difficult. Hence, 
the problem of “broken numbers” may well have been 
the first really challenging mathematical problem. How 
does one write down fractions? The Egyptians and 
the Mesopotamians came up with strikingly different 
answers, both of which are also quite different from 
the way we write them today. 

In Egypt (and later in Greece and much of the Mediter- 
ranean world), the fundamental notion was “the nth 
part,” as in “the third part of six is two.” In this lan- 
guage, one would express the idea of dividing 7 by 3 
as, “What is the third part of seven?” The answer is, 
“Two and the third.” The process was complicated by an 
additional restriction: one never recorded a final result 
using more than one of the same kind of part. Thus, the 
number we would want to express as “two fifth parts” 
would have to be given as “the third and the fifteenth.” 

In Mesopotamia, we find a very different idea, which 
may have arisen to allow easy conversion between dif- 
ferent kinds of units. First of all, the Babylonians had 
a way to generate symbols for all the numbers from 1 
to 59. For larger numbers, they used a positional sys- 
tem much like the one we use today, but based on 60 
rather than 10. So something like 1, 20 means one sixty 
and twenty units, that is, 1 x 60 + 20 = 80. The same 
system was then extended to fractions, so that one half 
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was represented as thirty sixtieths. It is convenient to 
mark the beginning of the fractional part with a semi- 
colon, though this and the comma are a modern con- 
vention that has no counterpart in the original texts. 
Then, for example, 1;24,36 means 1 + §g + <§jz, the frac- 
tion that we would more usually write as ,qo> or 1.41. 
The Mesopotamian way of writing numbers is called 
a sexagesimal place-value system by analogy with the 
system we use today, which is, of course, a decimal 
place-value system. 

Neither of these systems is really equipped to deal 
well with complicated numbers. In Mesopotamia, for 
example, only finite sexagesimal expressions were 
employed, so the scribes were not able to write down 
an exact value for the reciprocal of 7 because there is 
no finite sexagesimal expression for In practice, this 
meant that to divide by 7 required finding an approxi- 
mate answer. The Egyptian “parts” system, on the other 
hånd, can represent any positive rational number, but 
doing so may require a sequence of denominators that 
to our eyes looks very complicated. One of the sur- 
viving papyri includes problems that look designed to 
produce just such complicated answers. One of these 
answers is “14, the 4th, the 56th, the 97th, the 194th, 
the 388th, the 679th, the 776th,” which in modern nota- 
tion is the fraction 14 p. It seems that the joy of com- 
putation for its own sake became well-established very 
early in the development of mathematics. 

Mediterranean civilizations preserved both of these 
systems for a while. Most everyday numbers were spec- 
ified using the system of “parts.” On the other hånd, 
astronomy and navigation required more precision, so 
the sexagesimal system was used in those fields. This 
included measuring time and angles. The faet that we 
still divide an hour into sixty minutes and a minute into 
sixty seconds goes back, via the Greek astronomers, 
to the Babylonian sexagesimal fractions; almost four 
thousand years later, we are still influenced by the 
Babylonian scribes. 

2 Lengths Are Not Numbers 

Things get more complicated with the mathematics 
of classical Greek and Hellenistic civilizations. The 
Greeks, of course, are famous for coming up with 
the first mathematical proofs. They were the first to 
attempt to do mathematics in a rigorously deductive 
way, using clear initial assumptions and careful state- 
ments. This, perhaps, is what led them to be very 
careful about numbers and their relations to other 
magni tudes. 


Sometime before the fourth century b.c.e., the Greeks 
made the fundamental discovery of “incommensurable 
magnitudes.” That is, they discovered that it is not 
always possible to express two given lengths as (inte- 
ger) multiples of a third length. It is not just that lengths 
and numbers are conceptually distinet things (though 
this was important too). The Greeks had found a proof 
that one cannot use numbers to represent lengths. 

Suppose, they argued, you have two line segments. 
If their lengths are both given by numbers, then those 
numbers will at worst involve some fractions. By chang- 
ing the unit of length, then, we can make sure that both 
of the lengths correspond to whole numbers. In other 
words, it must be possible to choose a unit length so 
that each of our segments consists of a whole number 
multiple of the unit. The two segments, then, could be 
“measured together,” i.e., would be “commensurable.” 

Now here’s the catch: the Greeks could prove that this 
was not always the case. Their standard example had to 
do with the side and the diagonal of a square. We do not 
know exaetly how they first established that these two 
segments are not commensurable, but it might have 
been something like this: if you subtract the side from 
the diagonal, you will get a segment shorter than either 
of them; if both side and diagonal are measured by a 
common unit, then so is the difference. Now repeat the 
argument: take the remainder and subtract it from the 
side until we get a second remainder smaller than the 
first (it can be subtracted twice, in faet). The second 
remainder will also be measured by the common unit. 
It turns out to be quite easy to show that this process 
will never terminate; instead, it will produce smaller and 
smaller remainder segments. Eventually, the remainder 
segment will be smaller than the unit that supposedly 
measures it a whole number of times. That is impossi- 
ble (no whole number is smaller than 1, after all), and 
hence we can conclude that the common unit does not, 
in faet, exist. 

Of course, the diagonal does in faet have a length. 
Today, we would say that if the length of the side is 
one unit, then the length of the diagonal is y'2 units, 
and we would interpret this argument as showing that 
the number V2 is not a fraction. The Greeks did not 
quite see in what sense \/2 could be a number. Instead, 
it was a length, or, even better, the ratio between the 
length of the diagonal and the length of the side. Sim- 
ilar arguments could be applied to other lengths; for 
example, they knew that the side of a square of area 1 
and a square of area 10 are incommensurable. 
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The conclusion, then, is that lengths are not numbers: 
instead, they are some other kind of magnitude. But 
now we are faced with a proliferation of magnitudes: 
numbers, lengths, areas, angles, volumes, etc. Each of 
these must be taken as a different kind of quantity, not 
comparable with the others. 

This is a problem for geometry, particularly if we 
want to measure things. The Greeks solved this prob- 
lem by relying heavily on the notion of a ratio. Two 
quantities of the same type have a ratio, and this ratio 
was allowed to be equal to the ratio of two quantities of 
another type: equality of two ratios was defined using 
Eudoxus’s theory of proportion, the latter being one of 
the most important and deep ideas of Greek geometry. 
So, for example, rather than talking about a number 
called 7T, which to them would not be a number at all, 
they would say that “the ratio of the circle to the square 
on its radius is the same as the ratio of the circumfer- 
ence to the diameter.” Notice that one of the two ratios 
is between two areas, the other between two lengths. 
The number tt itself had no name in Greek mathemat- 
ics, but the Greeks did compare it with ratios between 
numbers: archimedes [VI.3] showed that it was just a 
little bit less than the ratio of 22 to 7 and just a little 
bit more than the ratio of 223 to 71. 

Doing things this way seems ungainly to us, but it 
worked very well. Furthermore, it is philosophically sat- 
isfying to conceive of a great variety of magnitudes 
organized into various kinds (segments, angles, sur- 
faces, etc.). Magnitudes of the same kind can be related 
to one another by ratios, and ratios can be compared 
with each other because they are relations perceived by 
our minds. In faet, the word for ratio, both in Greek and 
in Latin, is the same as the word for “reason” or “expla- 
nation” ( logos in Greek, ratio in Latin). From the begin- 
ning, “irrational” (alogos in Greek) could mean both 
“without a ratio” and “unreasonable.” 

Inevitably, the austere system of the theoretical 
mathematicians was somewhat disconnected from the 
everyday needs of people who needed to measure 
things such as lengths and angles. Astronomers kept 
right on using sexagesimal approximations, as did map- 
makers and other scientists. There was some “leakage” 
of course: in the first century c.e., Heron of Alexandria 
wrote a book that reads like an attempt to apply the 
theoreticians’ discoveries to practical measurement. It 
is to him, for example, that we owe the recommenda- 
tion to use ^ as an approximation for tt. (Presumably, 
he chose Archimedes' upper bound because it was the 
simpler number.) In theoretical mathematics, however, 


the distinetion between numbers and other kinds of 
magnitudes remained firm. 

The history of numbers in the West over the fifteen 
hundred years that followed the classical Greek period 
can be seen as having two main themes: first, the Greek 
compartmentalization between different kinds of quan- 
tities was slowly demolished; second, in order to do this 
the notion of number had to be generalized over and 
over again. 

3 Decimal Place Value 

Our system for representing whole numbers goes back, 
ultimately, to the mathematicians of the Indian subeon- 
tinent. Sometime before (probably well before) the fifth 
century c.e., they created nine symbols to designate 
the numbers from one to nine and used the position 
of these symbols to indicate their actual value. So a 3 
in the units position meant three, and a 3 in the tens 
position meant three tens, i.e., thirty. This, of course, is 
what we still do; the symbols themselves have changed, 
but not the principle. At about the same time, a place 
marker was developed to indicate an unoccupied space; 
this eventually evolved into om zero. 

Indian astronomy made extensive use of sines, which 
are almost never whole numbers. To represent these, 
a Babylonian-style sexagesimal system was used, with 
each “sexagesimal unit” being represented using the 
decimal system. So “thirty-three and a quarter” might 
be represented as 33 15', i.e., 33unitsand 15 “minutes” 
(sixtieths). 

Decimal place-value numeration was passed on from 
India to the Islamic world fairly early. In the ninth cen- 
tury c.e. in Baghdad, the recently established Capital 
of the caliphate, one finds al-khwårizm! [VI. 5] writ- 
ing a treatise on numeration in the Indian style, “using 
nine symbols.” Several centuries later, al-Khwarizmi’s 
treatise was translated into Latin. It was so popular 
and influential in late-medieval Europe that decimal 
numeration was often referred to as “algorism.” 

It is worth noting that in al-Khwårizmi’s writing zero 
still had a special status: it was a place holder, not 
a number. But once we have a symbol, and we start 
doing arithmetic using these symbols, the distinetion 
quickly disappears. We have to know how to add and 
multiply numbers by zero in order to multiply multi- 
digit numbers. In this way, “nothing” slowly became a 
number. 
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4 What People Want Is a Number 

As Greek culture was displaced by other influences, the 
practical tradition became more important. One can see 
this in al-KhwårizmTs other famous book, whose title 
gave us the word “algebra.” The book is actually a com- 
pendium of many different kinds of practical or semi- 
practical mathematics problems. Al-Khwarizml opens 
the book with a declaration that tells us at once that we 
are no longer in the Greek mathematical world: “When 
I considered what people generally want in calculating, 
I found that it is always a number.” 

The first portion of al-Khwårizml’s book deals with 
quadratic equations and with the algebraic manipula- 
tions (done entirely in words, with no symbols whatso- 
ever) needed to deal with them. His procedure is exactly 
the quadratic formula we still use, which of course 
requires extracting a square root. But in every example 
the number whose square root we need to find turns 
out to be a square, so that the square root is easily 
found — and al-Khwarizml does get a number! 

At other points in the book, however, we can see 
that al-Khwarizml is beginning to think of irrational 
square roots as number-like entities. He teaches the 
reader how to manipulate symbols with square roots 
in them, and gives (in words, of course) examples such 
as (20— V200) + (V200 — 10) = 10. In the second part of 
the book, which deals with geometry and measurement, 
one even sees an approximation to a square root: “The 
product is one thousand eight hundred and seventy- 
five; take its root, it is the area; it is forty-three and a 
little.” 

The mathematicians of medieval Islam were influ- 
enced not only by the practical tradition represented 
by al-Khwarizml, but also by the Greek tradition, espe- 
cially euclid's [VI.2] Elements. One finds in their writ- 
ing a mixture of Greek precision and a more prac- 
tical approach to measurement. In Omar Khayyam’s 
Algebra, for example, one sees both theorems in the 
Greek style and the desire for numerical solutions. In 
his discussion of cubic equations Khayyam manages to 
find solutions by means of geometric constructions but 
laments his inability to find numerical values. 

Slowly, however, the realm of “number” began to 
grow. The Greeks might have insisted that vTO was not 
a number, but rather a name for a line segment, the 
side of a square whose area is 10, or a name for a ratio. 
Among the medieval mathematicians, both in Islam and 
in Europe, yT 0 started to behave more and more like a 


number, entering into operations and even appearing 
as the solution of certain problems. 

5 Giving Equal Status to All Numbers 

The idea of extending the decimal place-value system to 
include fractions was discovered by several mathemati- 
cians. The most influential of these was stevin [VI. 10], 
a Flemish mathematician and engineer who popular- 
ized the system in a booklet called De Thiende (“The 
tenth”), first published in 1585. By extending place 
value to tenths, hundredths, and so on, Stevin created 
the system we still use today. More importantly, he 
explained how it simplified calculations that involved 
fractions, and gave many practical applications. The 
cover page, in faet, announces that the book is for 
“astrologers, surveyors, measurers of tapestries.” 

Stevin was certainly aware of some of the issues cre- 
ated by his move. He knew, for example, that the dec- 
imal expansion for j was infmitely long; his discus- 
sion simply says that while it might be more correct 
to say that the full infinite expansion was the correct 
representation, in practice it made little difference if 
we truncated it. 

Stevin was also aware that his system provided a way 
to attach a “number” (meaning a decimal expansion) 
to every single length. He saw little difference between 
1.1764705882 (the beginning of the decimal expansion 
of fj) and 1.4142135623 (the beginning of the decimal 
expansion of y'2). In his Arithmetic he boldly declared 
that all (positive) numbers were squares, cubes, fourth 
powers, etc., and that roots were just numbers. He also 
says that “there are no absurd, irrational, irregular, 
inexplicable, or surd numbers.” Those were all terms 
used for irrational numbers, i.e., numbers that are not 
fractions. 

What Stevin was proposing, then, was to Hatten the 
incredible diversity of “quantities” or “magnitudes” 
into one expansive notion of number, defined by dec- 
imal expansions. He was aware that these numbers 
could be represented as lengths along a line. This 
amounted to a fairly clear notion of what we now call 
the positive real numbers. 

Stevin’s proposal was made immensely more influ- 
ential by the invention of logarithms. Like the sine and 
the cosine, these were practical computational tools. In 
order to be used, they needed to be tabulated, and the 
tables were given in decimal form. Very soon, everyone 
was using decimal representation. 
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It was only much later that it came to be understood 
what a bold leap this move represented. The positive 
real numbers are not just a larger number system; they 
are an immensely larger number system, whose inter- 
nal complexity we still do not fully understand (see set 
THEORY [IV. 1]). 

6 Real, False, Imaginary 

Even as Stevin was writing, the next steps were being 
taken: under the pressure of the theory of equations, 
negative numbers and complex numbers began to be 
useful. Stevin himself was already aware of negative 
numbers, though he was clearly not quite comfortable 
with them. For example, he explained that the faet that 
-3 is a root of x 2 + x - 6 really means that 3 is a root 
of the associated polynomial x 2 - x - 6, obtained by 
replacing x by -x everywhere. 

This was an easy dodge, but cubic equations cre- 
ated more difflcult problems. The work of several Ital- 
ian mathematicians of the sixteenth century led to a 
method for solving cubic equations. As a crucial step, 
this method involved extracting a square root. The 
problem was that the number whose root was needed 
sometimes came out negative. 

Up until then, it had always turned out that when an 
algebraic problem led to the extraction of the square 
root of a negative number, the problem simply had no 
solution. But the equation x 3 = 15x + 4 clearly did 
have a solution — indeed, x = 4 is one— it was just that 
applying the cubic formula required computing V-121. 

It was bombelli [VI.8], also a mathematician and 
engineer, who decided to bite the bullet and just see 
what happened. In his Algebra, published in 1572, he 
went ahead and computed with this “new kind of rad- 
ical” and showed that he could find the solution of 
the cubic in this way. This showed that the cubic for- 
mula did indeed work in this case; more importantly, 
it showed that these strange new numbers could be 
useful. 

It took a while for people to become comfortable with 
these new quantities. About fifty years later, we find 
both Albert Girard and descartes [VI. 11] saying that 
equations can have three sorts of roots: true (mean- 
ing positive), false (negative), and imaginary. It is not 
completely clear that they understood that these imag- 
inary roots would be what we now call complex num- 
bers; Descartes, at least, sometimes seems to be saying 
that an equation of degree n must have n roots, and 


that the ones that are neither “true” nor “false” must 
simply be imagined. 

Slowly, however, complex numbers began to be used. 
They came up in the theory of equations, in debates 
about the logarithms of negative numbers, and in con- 
nection to trigonometry. Their connection with the sine 
and cosine funetions (via the exponential) was turned 
into a powerful tool by euler [VI. 19] in the eighteenth 
century. By the middle of the eighteenth century, it was 
well-known that every polynomial had a complete set 
of roots in the complex numbers. This result became 
known as the fundamental theorem of algebra 
[V.15]; it was finally proved to everyone’s satisfaction 
by gauss [VI.26]. Thus, the theory of equations did not 
seem to require any further extension of the notion of 
number. 

7 Number Systems, Old and New 

Since complex numbers are clearly different from real 
numbers, their presence stimulated people to begin 
classifying numbers into different kinds. Stevin's egal- 
itarianism had its impact, but it could not quite erase 
the faet that whole numbers are nicer than decimals, 
and that fractions are generally easier to grasp than 
irrational numbers. 

In the nineteenth century, all sorts of new ideas cre- 
ated the need for a more careful look at this classifi- 
cation. In number theory, Gauss and kummer [VI.40] 
started looking at subsets of the complex numbers that 
behaved in a way analogous to the integers, such as the 
set of all numbers a + with a and b both integers. 

In the theory of equations, galois [VI.41] pointed out 
that in order to do a careful analysis of the solvability of 
an equation one must start by agreeing on what num- 
bers count as “rational.” So, for example, he pointed out 
that in abel’s [VI.33] theorem on the unsolvability of 
the quintic, “rational” meant “expressible as a quotient 
of polynomials in the symbols used as the coefficients 
of the equation,” and he noted that the set of all such 
expressions obeyed the usual rules of arithmetic. 

In the eighteenth century, Johann Lambert had estab- 
lished that e and tt were irrational, and conjectured 
that in faet they were transcendental, that is, that they 
were not roots of any polynomial equation. Even the 
existence of transcendental numbers was not known at 
the time; liouville [VI. 3 9] proved that such numbers 
exist in 1844. Within a few decades, it was proved that 
both e and tt were transcendental, and later in the cen- 
tury cantor [VI. 54] showed that in faet the vast major- 
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ity of real numbers were transcendental. Cantor’s dis- 
covery highlighted, for the first time, that the system 
Stevin had popularized contained unexpected depths. 

Perhaps the most important change in the concept of 
number, however, came after hamilton’s [VI. 3 7] dis- 
covery, in 1843, of a completely new number system. 
Hamilton had noticed that coordinatizing the plane 
using complex numbers (rather than simply using pairs 
of real numbers) vastly simplified plane geometry. He 
set out to find a similar way to parametrize three- 
dimensional space. This turned out to be impossible, 
but led Hamilton to a /bur-dimensional system, which 
he called the quaternions [III.78]. These behaved 
much like numbers, with one crucial difference: mul- 
tiplication was not commutative, that is, if q and q’ are 
quaternions, qq' and q'q are usually not the same. 

The quaternions were the first system of “hyper- 
complex numbers,” and their appearance generated 
lots of new questions. Were there other such systems? 
What counts as a number system? If certain “numbers” 
can fail to satisfy the commutative law, can we make 
numbers that break other rules? 

In the long run, this intellectual ferment led math- 
ematicians to let go of the vague notion of “number” 
or “quantity” and to hold on, instead, to the more for- 
mal notion of an algebraic structure. Each of the num- 
ber systems, in the end, is simply a set of entities on 
which we can do operations. What makes them inter- 
esting is that we can use them to parametrize, or coor- 
dinatize, systems that interest us. The whole numbers 
(or integers, to give them their latinized formal name), 
for example, formalize the notion of counting, while 
the real numbers parametrize the line and serve as the 
basis for geometry. 

By the beginning of the twentieth century, there were 
many well-known number systems. The integers had 
pride of place, followed by a nested hierarchy con- 
sisting of the rational numbers (i.e., the fractions), the 
real numbers (Stevin’s decimals, now carefully formal- 
ized), and the complex numbers. Still more general than 
the complex numbers were the quaternions. But these 
were by no means the only systems around. Number 
theorists worked with several different helds of alge- 
braic numbers, subsets of the complex numbers that 
could be understood as autonomous systems. Galois 
had introduced hnite systems that obeyed the usual 
rules of arithmetic, which we now call hnite helds. Func- 
tion theorists worked with helds of functions; they cer- 
tainly did not think of these as numbers, but their 
analogy to number systems was known and exploited. 


Early in the twentieth century, Kurt Hensel intro- 
duced the p-adic numbers [III. 5 3], which were built 
from the rational numbers by giving a special role to a 
prime number p. (Since p can be chosen at will, Hensel 
in faet created inhnitely many new number systems.) 
These too “obeyed the usual rules of arithmetic,” in 
the sense that addition and multiplication behaved as 
expected; in modern language, they were fields. The 
p-adics provided the hrst system of things that were 
recognizably numbers but that had no visible relation 
to the real or complex numbers— apart from the faet 
that both systems contained the rational numbers. As 
a result, they led Ernst Steinitz to create an abstract 
theory of helds. 

The move to abstraction that appears in Steinitz’s 
work had also occurred in other parts of mathemat- 
ics, most notably the theory of groups and their repre- 
sentations and the theory of algebraic numbers. All of 
these theories were brought together into conceptual 
unity by noether [VI.76], whose program came to be 
known as “abstract algebra.” This left numbers behind 
completely, focusing instead on the abstract structure 
of sets with operations. 

Today, it is no longer that easy to decide what counts 
as a “number.” The objects from the original sequence 
of “integer, rational, real, and complex” are certainly 
numbers, but so are the p-adics. The quaternions are 
rarely referred to as “numbers,” on the other hånd, 
though they can be used to coordinatize certain math- 
ematical notions. In faet, even stranger systems can 
show up as coordinates, such as Cayley’s octonions 
[IH. 78]. In the end, whatever serves to parametrize or 
coordinatize the problem at hånd is what we use. If the 
requisite system turns out not to exist yet, well, one 
just has to invent it. 
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1 Introduction 

The modern view of geometry was inspired by the novel 
geometrical theories of hilbert [VI.63] and Einstein in 
the early years of the twentieth century, which built in 
their turn on other radical reformulations of geometry 
in the nineteenth century. For thousands of years, the 
geometrical knowledge of the Greeks, as set out most 
notably in euclid’s [VI.2] Elements, was held up as a 
paradigm of perfect rigor, and indeed of human know- 
ledge. The new theories amounted to the overthrow of 
an entire way of thinking. This essay will pursue the his- 
tory of geometry, starting from the time of Euclid, con- 
tinuing with the advent of non-Euclidean geometry, and 
ending with the work of riemann [VI.49], klein [VI. 5 7], 
and poincaré [VI.61]. Along the way, we shall exam- 
ine how and why the notions of geometry changed so 
remarkably. Modern geometry itself will be discussed 
in later parts of this book. 

2 Naive Geometry 

Geometry generally, and Euclidean geometry in partic- 
ular, is informally and rightly taken to be the math- 
ematical description of what you see all around you: 
a space of three dimensions (left-right, up-down, for- 
wards-backwards) that seems to extend indefinitely 
far. Objects in it have positions, they sometimes move 
around and occupy other positions, and all of these 
positions can be specified by measuring lengths along 
straight lines: this object is twenty meters from that 
one, it is two meters tall, and so on. We can also mea- 
sure angles, and there is a subtle relationship between 


angles and lengths. Indeed, there is another aspect 
to geometry, which we do not see but which we rea- 
son about. Geometry is a mathematical subject that is 
full of theorems — the isosceles triangle theorem, the 
Pythagorean theorem, and so on— which collectively 
summarize what we can say about lengths, angles, 
shapes, and positions. What distinguishes this aspect 
of geometry from most other kinds of science is its 
highly deductive nature. It really seems that by tak- 
ing the simplest of concepts and thinking hard about 
them one can build up an impressive, deductive body 
of knowledge about space without having to gather 
experimental evidence. 

But can we? Is it really as simple as that? Can we have 
genuine knowledge of space without ever leaving our 
armchairs? It turns out that we cannot: there are other 
geometries, also based on the concepts of length and 
angle, that have every claim to be useful, but that dis- 
agree with Euclidean geometry. This is an astonishing 
discovery of the early nineteenth century, but, before it 
could be made, a naive understanding of fundamental 
concepts, such as straightness, length, and angle, had 
to be replaced by more precise definitions— a process 
that took many hundreds of years. Once this had been 
done, first one and then infinitely many new geometries 
were discovered. 

3 The Greek Formulation 

Geometry can be thought of as a set of useful facts 
about the world, or else as an organized body of know- 
ledge. Either way, the origins of the subject are much 
disputed. It is clear that the civilizations of Egypt and 
Babylonia had at least some knowledge of geometry— 
otherwise, they could not have built their large cities, 
elaborate temples, and pyramids. But not only is it dif- 
ficult to give a rich and detailed account of what was 
known before the Greeks, it is difficult even to make 
sense of the few scattered sources that we have from 
before the time of Plato and Aristotle. One reason for 
this is the spectacular success of the later Greek writer, 
and author of what became the definitive text on geom- 
etry, Euclid of Alexandria (ca. 300 b.c.e.). One glance at 
his famous Elements shows that a proper account of 
the history of geometry will have to be about some- 
thing much more than the acquisition of geometrical 
facts. The Elements is a highly organized, deductive 
body of knowledge. It is divided into a number of dis- 
tinet themes, but each theme has a complex theoret- 
ical structure. Thus, whatever the origins of geom- 
etry might have been, by the time of Euclid it had 
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become the paradigm of a logical subject, offering a 
kind of knowledge quite different from, and seemingly 
higher than, knowledge directly gleaned from ordinary 
experience. 

Rather, therefore, than attempt to elucidate the early 
history of geometry, this essay will trace the high road 
of geometry' s claim on our attention: the apparent cer- 
tainty of mathematical knowledge. It is exactly this 
claim to a superior kind of knowledge that led even- 
tually to the remarkable discovery of non-Euclidean 
geometry. there are geometries other than Euclid’s that 
are every bit as rigorously logical. Even more remark- 
ably, some of these turn out to provide better models 
of physical space than Euclidean geometry. 

The Elements opens with four books on the study 
of plane figures: triangles, quadrilaterals, and circles. 
The famous theorem of Pythagoras is the forty-seventh 
proposition of the first book. Then come two books on 
the theory of ratio and proportion and the theory of 
similar figures (scale copies), treated with a high degree 
of sophistication. The next three books are about whole 
numbers, and are presumably a reworking of much 
older material that would now be classified as elemen- 
tary number theory. Here, for example, one finds the 
famous result that there are infinitely many prime num- 
bers. The next book, the tenth, is by far the longest, 
and deals with the seemingly specialist topic of lengths 
of the form Ja±Vb (to write them as we would). The 
final three books, where the curious lengths studied in 
Book X play a role, are about three-dimensional geom- 
etry. They end with the construction of the five regular 
solids and a proof that there are no more. The discov- 
ery of the fifth and last had been one of the topics that 
excited Plato. Indeed, the five regular solids are crucial 
to the cosmology of Plato’s late work the Timaeus. 

Most books of the Elements open with a number 
of definitions, and each has an elaborate deductive 
structure. For example, to understand the Pythagorean 
theorem, one is driven back to previous results, and 
thence to even earlier results, until finally one comes 
to rest on basic definitions. The whole structure is 
quite compelling: reading it as an adult turned the 
philosopher Thomas Hobbes from incredulity to last- 
ing belief in a single sitting. What makes the Elements 
so convincing is the nature of the arguments employed. 
With some exceptions, mostly in the number-theoretic 
books, these arguments use the axiomatic method. 
That is to say, they start with some very simple axioms 
that are intended to be self-evidently true, and proceed 
by purely logical means to deduce theorems from them. 


For this approach to work, three features must be 
in place. The first is that circularity should be care- 
fully avoided. That is, if you are trying to prove a state- 
ment P and you deduce it from an earlier statement, 
and deduce that from a yet earlier statement, and so 
on, then at no stage should you reach the statement 
P again. That would not prove P from the axioms, but 
merely show that all the statements in your chain were 
equivalent. Euclid did a remarkable job in this respect. 

The second necessary feature is that the rules of 
inference should be clear and acceptable. Some geomet- 
rical statements seem so obvious that one can fail to 
notice that they need to be proved: ideally, one should 
use no properties of figures other than those that have 
been clearly stated in their definitions, but this is a diffi- 
cult requirement to meet. Euclid’s success here was still 
impressive, but mixed. On the one hånd, the Elements 
is a remarkable work, far outstripping any contempo- 
rary account of any of the topics it covers, and capable 
of speaking down the millennia. On the other, it has 
little gaps that from time to time later commentators 
would fill. For example, it is neither explicitly assumed 
nor proved in the Elements that two circles will meet 
if their centers lie outside each other and the sum of 
their radii is greater than the distance between their 
centers. However, Euclid is surprisingly clear that there 
are rules of inference that are of general, if not indeed 
universal, applicability, and others that apply to math- 
ematics because they rely on the meanings of the terms 
involved. 

The third feature, not entirely separable from the 
second, is adequate definitions. Euclid offered two, or 
perhaps three, sorts of definition. Book I opens with 
seven definitions of objects, such as “point” and “line,” 
that one might think were primitive and beyond def- 
inition, and it has recently been suggested that these 
definitions are later additions. Then come, in Book I 
and again in many later books, definitions of familiar 
figures designed to make them amenable to mathemat- 
ical reasoning: “triangle,” “quadrilateral,” “circle,” and 
so on. The postulates of Book I form the third class of 
definition and are rather more problematic. 

Book I States five “common notions,” which are rules 
of inference of a very general sort. For example, “If 
equals be added to equals, the wholes are equals.” The 
book also has five “postulates,” which are more nar- 
rowly mathematical. For example, the first of these 
asserts that one may draw a straight line from any point 
to any point. One of these postulates, the fifth, became 
notorious: the so-called parallel postulate. It says that 
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“If a straight line falling on two straight lines make the 
interior angles on the same side less than two right 
angles, the two straight lines, if produced indefinitely, 
meet on that side on which are the angles less than two 
right angles.” 

Parallel lines, therefore, are straight lines that do not 
meet. A helpful rephrasing of Euclid’s parallel postulate 
was introduced by the Scottish editor, Robert Simson. It 
appears in his edition of Euclid’s Elements from 1806. 
There he showed that the parallel postulate is equiva- 
lent, if one assumes those parts of the Elements that 
do not depend on it, to the following statement: given 
any line m in a plane, and any point P in that plane that 
does not lie on the line m, there is exactly one line n 
in the plane that passes through the point P and does 
not meet the line m. From this formulation it is clear 
that the parallel postulate makes two assertions: given 
a line and a point as described, a parallel line exists and 
it is unique. 

It is worth noting that Euclid himself was probably 
well aware that the parallel postulate was awkward. It 
asserts a property of straight lines that seems to have 
made Greek mathematicians and philosophers uncom- 
fortable, and this may be why its appearance in the Ele- 
ments is delayed until proposition 29 of Book I. The 
commentator Proclus (fifth century c.e.), in his exten- 
sive discussion of Book I of the Elements, observed that 
the hyperbola and asymptote get doser and doser as 
they move outwards, but they never meet. If a line and a 
curve can do this, why not two lines? The matter needs 
further analysis. Unfortunately, not much of the Ele- 
ments would be left if mathematicians dropped the par- 
allel postulate and retreated to the consequences of the 
remaining definitions: a significant body of knowledge 
depends on it. Most notably, the parallel postulate is 
needed to prove that the angles in a triangle add up to 
two right angles— a crucial result in establishing many 
other theorems about angles in figures, including the 
Pythagorean theorem. 

Whatever claims educators may have made about 
Euclid's Elements down the ages, a significant number 
of experts knew that it was an unsatisfactory compro- 
mise: a useful and remarkably rigorous theory could be 
had, but only at the price of accepting the parallel pos- 
tulate. But the parallel postulate was difficult to accept 
on trust: it did not have the same intuitively obvious 
feel of the other axioms and there was no obvious way 
of verifying it. The higher one’s standards, the more 
painful this compromise was. What, the experts asked, 
was to be done? 


One Greek discussion must suffice here. In Proclus’s 
view, if the truth of the parallel postulate was not obvi- 
ous, and yet geometry was bare without it, then the only 
possibility was that it was true because it was a theo- 
rem. And so he gave it a proof. He argued as follows. Let 
two lines m and n cross a third line k at P and Q, respec- 
tively, and make angles with it that add up to two right 
angles. Now draw a line 1 that crosses m at P and enters 
the space between the lines m and n. The distance 
between 1 and m as one moves away from the point P 
continually increases, said Proclus, and therefore line 1 
must eventually cross line n. 

Proclus’s argument is flawed. The flaw is subtle, and 
sets us up for what is to come. He was correct that 
the distance between the lines 1 and m increases indef- 
initely. But his argument assumes that the distance 
between lines m and n does not also increase indefi- 
nitely, and is instead bounded. Now Proclus knew very 
well that if the parallel postulate is granted, then it can 
be shown that the lines m and n are parallel and that 
the distance between them is a constant. But until the 
parallel postulate is proved, nothing prevents one say- 
ing that the lines m and n diverge. Proclus’s proof does 
not therefore work unless one can show that lines that 
do not meet also do not diverge. 

Proclus’s attempt was not the only one, but it is typi- 
cal of such arguments, which all have a standard form. 
They start by detaching the parallel postulate from 
Euclid’s Elements, together with all the arguments and 
theorems that depend on it. Let us call what remains 
the “core” of the Elements. Using this core, an attempt 
is then made to derive the parallel postulate as a the- 
orem. The correct conclusion to be derived from Pro- 
clus’s attempt is not that the parallel postulate is a the- 
orem, but rather that, given the core of the Elements, 
the parallel postulate is equivalent to the statement 
that lines that do not meet also do not diverge. Aganis, 
a writer of the sixth century c.e. about whom almost 
nothing is known, assumed, in a later attempt, that par- 
allel lines are everywhere equidistant, and his argument 
showed only that, given the core, the Euclidean defini- 
tion of parallel lines is equivalent to defining them to 
be equidistant. 

Notice that one cannot even enter this debate unless 
one is clear which properties of straight lines belong to 
them by definition, and which are to be derived as the- 
orems. If one is willing to add to the store of “common- 
sense” assumptions about geometry as one goes along, 
the whole careful deductive structure of the Elements 
collapses into a pile of facts. 
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This deductive character of the Elements is clearly 
something that Euclid regarded as important, but one 
can also ask what he thought geometry was about. Was 
it meant, for example, as a mathematical description 
of space? No surviving text tells us what he thought 
about this question, but it is worth noting that the most 
celebrated Greek theory of the universe, developed by 
Aristotle and many later commentators, assumed that 
space was finite, bounded by the sphere of the fixed 
stars. The mathematical space of the Elements is infi- 
nite, and so one has at least to consider the possibility 
that, for all these writers, mathematical space was not 
intended as a simple idealization of the physical world. 

4 Arab and Islamic Commentators 

What we think of today as Greek geometry was the 
work of a handful of mathematicians, mostly concen- 
trated in a period of less than two centuries. They were 
eventually succeeded by a somewhat larger number of 
Arabic and Islamic writers, spread out over a much 
greater area and a longer time. These writers tend to be 
remembered as commentators on Greek mathematics 
and science, and for transmitting them to later West- 
ern authors, but they should also he remembered as 
Creative, innovative mathematicians and scientists in 
their own right. A number of them took up the study 
of Euclid’s Elements, and with it the problem of the par- 
allel postulate. They too took the view that it was not 
a proper postulate, but one that could be proved as a 
theorem using the core alone. 

Among the first to attempt a proof was Thåbit 
ibn Qurra. He was a pagan from near Aleppo who lived 
and worked in Baghdad, where he died in 901. Here 
there is room to describe only his first approach. He 
argued that if two lines m and n are crossed by a third, 
k, and if they approach each other on one side of the 
line k, then they diverge indefinitely on the other side 
of k. He deduced that two lines that make equal alter- 
nate angles with a transversal (the marked angles in 
figure 1) cannot approach each other on one side of a 
transversal: the symmetry of the situation would imply 
that they approached on the other side as well, but 
he had shown that they would have to diverge on the 
other side. From this he deduced the Euclidean theory 
of parallels, but his argument was also flawed, since he 
had not considered the possibility that two lines could 
diverge in both directions. 

The distinguished Islamic mathematician and scien- 
tist ibn al-Haytham was born in Basra in 965 and died 



Figure 1 The lines m and n make equal alternate 
angles a and b with the transversal k. 



Figure 2 AB and CD are equal, the angle ADC is a right 
angle, A'B' is an intermediate position of AB as it moves 
toward CD. 

in Egypt in 1 04 1 . He took a quadrilateral with two equal 
sides perpendicular to the base and dropped a perpen- 
dicular from one side to the other. He now attempted iroqueryreiatmg 
to prove that this perpendicular is equal to the base, here and sentence 
and to do so he argued that as one of two original per- things. ok? 
pendiculars is moved toward the other, its tip sweeps 
out a straight line, which will coincide with the per- 
pendicular just dropped (see figure 2). This amounts 
to the assumption that the curve everywhere equidis- 
tant from a straight line is itself straight, from which 
the parallel postulate easily follows, and so his attempt 
fails. His proof was later heavily criticized by Omar 
Khayyam for its use of motion, which he found fun- 
damentally unclear and alien to Euclid’s Elements. It 
is indeed quite distinet from any use Euclid had for 
motion in geometry, because in this case the nature 
of the curve obtained is not clear: it is precisely what 
needs to he analyzed. 

The last of the Islamic attempts on the parallel pos- 
tulate is due to Nasir al-Din al-Tusi. He was born in Iran 
in 1201 and died in Baghdad in 1274. His extensive 
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commentary is also one of our sources of knowledge 
of earlier Islamic mathematical work on this subject. 
Al-Tusi focused on showing that if two lines begin to 
converge, then they must continue to do so until they 
eventually meet. To this end he set out to show that 

(*) if 1 and m are two lines that make an angle of less 
than a right angle, then every line perpendicular 
to 1 meets the line m. 

He showed that if (*) is true, then the parallel postulate 
follows. However, his argument for O) is flawed. 

It is genuinely difficult to see what is wrong with 
some of these arguments if one uses only the tech- 
niques available to mathematicians of the time. Islamic 
mathematicians showed a degree of sophistication that 
was not to be surpassed by their Western successors 
until the eighteenth century. Unfortunately, however, 
their writings did not come to the attention of the West 
until much later, with the exception of a single work 
in the Vatican Library, published in 1 594, which was 
for many years erroneously attributed to al-Tusi (and 
which may have been the work of his son). 

5 The Western Revival of Interest 

The Western revival of interest in the parallel postu- 
late came with the second wave of translations of Greek 
mathematics, led by Commandino and Maurolico in the 
sixteenth century and spread by the advent of print- 
ing. Important texts were discovered in a number of 
older libraries, and ultimately this led to the produc- 
tion of new texts of Euclid’s Elements. Many of these 
had something to say about the problem of parallels, 
pithily referred to by Henry Savile as “a blot on Euclid.” 
For example, the powerful Jesuit Christopher Clavius, 
who edited and reworked the Elements in 1574, tried to 
argue that parallel lines couldbe defined as equidistant 
lines. 

The ready Identification of physical space with the 
space of Euclidean geometry came about gradually dur- 
ing the sixteenth and seventeenth centuries, after the 
acceptance of Copernican astronomy and the aboli- 
tion of the so-called sphere of fixed stars. It was can- 
onized by newton [VI.14] in his Prinapia Mathemat- 
ica, which proposed a theory of gravitation that was 
firmly situated in Euclidean space. Although Newto- 
nian physics had to fight for its acceptance, Newto- 
nian cosmology had a smooth path and became the 
unchallenged orthodoxy of the eighteenth century. It 
can be argued that this Identification raised the stakes, 


because any unexpected or counterintuitive conclusion 
drawn solely from the core of the Elements was now, 
possibly, a counterintuitive faet about space. 

In 1663 the English mathematician John Wallis took 
a much more subtle view of the parallel postulate than 
any of his predecessors. He had been instructed by Hal- 
ley, who could read Arabic, in the contents of the apoc- 
ryphal edition of al-Tusi’s work in the Vatican Library, 
and he too gave an attempted proof. Unusually, Wallis 
also had the insight to see where his own argument was 
flawed, and commented that what it really showed was 
that, in the presence of the core, the parallel postulate 
was equivalent to the assertion that there exist similar 
figures that are not congruent. 

Half a century later, Wallis was followed by the most 
persistent and thoroughgoing of all the defenders of 
the parallel postulate, Gerolamo Saccheri, an Italian 
Jesuit who published in 1733, the year of his death, 
a short book called Euclid Freed of Every Flaw. This 
little masterpiece of classical reasoning opens with a 
trichotomy. Unless the parallel postulate is known, the 
angle sum of a triangle may be either less than, equal to, 
or greater than two right angles. Saccheri showed that 
whatever happens in one triangle happens for them all, 
so there are apparently three geometries compatible 
with the core. In the first, every triangle has an angle 
sum less than two right angles (call this case L). In the 
second, every triangle has an angle sum equal to two 
right angles (call this case E). In the third, every trian- 
gle has an angle sum greater than two right angles (call 
this case G). Case E is, of course, Euclidean geometry, 
which Saccheri wished to show was the only case pos- 
sible. He therefore set to work to show that each of the 
other cases independently self-destructed. He was suc- 
cessful with case G, and then turned to case L “which 
alone obstructs the truth of the [parallel] axiom,” as he 
put it. 

Case L proved to be difficult, and during the course 
of his investigations Saccheri established a number of 
interesting propositions. For example, if case L is true, 
then two lines that do not meet have just one common 
perpendicular, and they diverge on either side of it. 
In the end, Saccheri tried to deal with his difficulties 
by relying on foolish statements about the behavior of 
lines at infinity: it was here that his attempted proof 
failed. 

Saccheri’s work sank slowly, though not completely, 
into obscurity. It did, however, come to the atten- 
tion of the Swiss mathematician Johann Heinrich Lam- 
bert, who pursued the trichotomy but, unlike Saccheri, 
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stopped short of claiming success in proving the paral- 
lel postulate. Instead the work was abandoned, and was 
published only in 1786, after his death. Lambert dis- 
tinguished carefully between unpalatable results and 
impossibilities. He had a sketch of an argument to show 
that in case L the area of a triangle is proportional to 
the difference between two right angles and the angle 
sum of the triangle. He knew that in case L similar tri- 
angles had to be congruent, which would imply that the 
tables of trigonometric functions used in astronomy 
were not in faet valid and that different tables would 
have to be produced for every size of triangle. In par- 
ticular, for every angle less than 60° there would be 
precisely one equilateral triangle with that given angle 
at each vertex. This would lead to what philosophers 
called an “absolute” measure of length (one could take, 
for instance, the length of the side of an equilateral tri- 
angle with angles equal to 30°), which leibniz’s [VI.15] 
follower Wolff had said was impossible. And indeed it 
is counterintuitive: lengths are generally defined in rel- 
ative terms, as, for instance, a certain proportion of the 
length of a meter rod in Paris, or of the circumference 
of Earth, or of something similar. But such arguments, 
said Lambert, “were drawn from love and hate, with 
which a mathematician can have nothing to do.” 

6 The Shift of Focus around 1800 

The phase of Western interest in the parallel postu- 
late that began with the publication of modern editions 
of Euclid’s Elements started to decline with a further 
turn in that enterprise. After the French revolution, 
legendre [VI.24] set about writing textbooks, largely 
for the use of students hoping to enter the École Poly- 
technique, that would restore the study of elementary 
geometry to something like the rigorous form in which 
it appeared in the Elements. However, it was one thing 
to seek to replace hooks of a heavily intuitive kind, but 
quite another to deliver the requisite degree of rigor. 
Legendre, as he came to realize, ultimately failed in his 
attempt. Specifically, like everyone before him, he was 
unable to give an adequate defense of the parallel pos- 
tulate. Legendre’s Éléments de Géométrie ran to numer- 
ous editions, and from time to time a different attempt 
on the postulate was made. Some of these attempts 
would be hard to describe favorably, but the hest can 
be extremely persuasive. 

Legendre’s work was classical in spirit, and he still 
took it for granted that the parallel postulate had to 
be true. But by around 1800 this attitude was no longer 


universally held. Not everybody thought that the postu- 
late must, somehow, be defended, and some were pre- 
pared to contemplate with equanimity the idea that it 
might be false. No clearer illustration of this shift can 
be found than a brief note sent to gauss [VI. 2 6] by 
F. K. Schweikart, a Professor of Law at the University 
of Marburg, in 1818. Schweikart described in a page 
the main results he had been led to in what he called 
“astral geometry,” in which the angle sum of a triangle 
was less than two right angles: squares had a partic - 
ular form, and the altitude of a right-angled isosceles 
triangle was bounded by an amount Schweikart called 
“the constant.” Schweikart went so far as to claim that 
the new geometry might even be the true geometry of 
space. Gauss replied positively. He accepted the results, 
and he claimed that he could do all of elementary 
geometry once a value for the constant was given. One 
could argue, somewhat ungenerously, that Schweikart 
had done little more than read Lambert’s posthumous 
book — although the theorem about isosceles triangles 
is new. However, what is notable is the attitude of 
mind: the idea that this new geometry might be true, 
and not just a mathematical curiosity. Euclid’s Elements 
shackled him no more. 

Unfortunately, it is mueh less clear precisely what 
Gauss himself thought. Some historians, mindful of 
Gauss’s remarkable mathematical originality, have 
been inelined to interpret the evidence in such a way 
that Gauss emerges as the first person to discover 
non-Euclidean geometry. The evidence, however, is very 
slight, and difficult to interpret. There are traces of 
some early investigations by Gauss of Euclidean geom- 
etry that include a study of a new definition of parallel 
lines; there are claims made by Gauss late in life that he 
had known this or that faet for many years; and there 
are letters he wrote to his friends. But there is no mate- 
rial in the surviving papers that allows us to reconstruct 
what Gauss knew, or that supports the claim that Gauss 
discovered non-Euclidean geometry. 

Rather, the picture would seem to be that Gauss came 
to realize during the 1810s that all previous attempts 
to derive the parallel postulate from the core of Euclid- 
ean geometry had failed and that all future attempts 
would probably fail as well. He became more and more 
convinced that there was another possible geometry 
of space. Geometry ceased, in his mind, to have the 
status of arithmetic, which was a matter of logic, and 
became associated with mechanics, an empirical sci- 
ence. The simplest accurate statement of Gauss's posi- 
tion through the 1820s is that he did not doubt that 
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space might be described by a non-Euclidean geometry, 
and of course there was only one possibility: that of 
case L described above. It was an empirical matter, but 
one that could not be resolved by land-based measure- 
ments because any departure from Euclidean geometry 
was, evidently, very small. In this view he was sup- 
ported by his friends, such as Bessel and Olbers, both 
professional astronomers. Gauss the scientist was con- 
vinced, but Gauss the mathematician may have retained 
a small degree of doubt, and certainly never devel- 
oped the mathematical theory required to describe 
non-Euclidean geometry adequately. 

One theory available to Gauss from the early 1820s 
was that of differential geometry. Gauss eventually 
pubbshed one of his masterworks on this subject, 
his Disquisitiones Generales circa Superficies Curvas 
(1827). In it he showed how to describe geometry on 
any surface in space, and how to regard certain fea- 
tures of the geometry of a surface as intrinsic to the sur- 
face and independent of how the surface was embed- 
ded into three-dimensional space. It would have been 
possible for Gauss to consider a surface of constant 
negative curvature [III.80], and to show that triangles 
on such a surface are described by hyperbohc trigono- 
metric formulas, but he did not do this until the 1840s. 
Had he done so, he would have had a surface on which 
the formulas of a geometry satisfying case L apply. 

A surface, however, is not enough. We accept 
the validity of two-dimensional Euclidean geometry 
because it is a simplification of three-dimensional 
Euclidean geometry. Before a two-dimensional geom- 
etry satisfying the hypotheses of case L can be 
accepted, it is necessary to show that there is a plau- 
sible three-dimensional geometry analogous to case L. 
Such a geometry has to be described in detail and 
shown to be as plausible as Euchdean three-dimen- 
sional geometry. This Gauss simply never did. 

7 Bolyai and Lobachevskii 

The farne for discovering non-Euclidean geometry goes 
to two men, bolyai [VI.34] in Hungary and lobachev- 
skii [VI. 31] in Russia, who independently gave very sim- 
ilar accounts of it. In particular, both men described a 
system of geometry in two and three dimensions that 
differed from Euclid’s but had an equahy good claim to 
be the geometry of space. Lobachevskii pubhshed first, 
in 1829, but only in an obscure Russian journal, and 
then in French in 1837, in German in 1840, and again 
in French in 1 8 5 5 . Bolyai pubhshed his account in 1 83 1 , 



Figure 3 The lines n' and n" through P separate the lines 

through P that meet the line m from those that do not. 

in an appendix to a two-volume work on geometry by 
his father. 

It is easiest to describe their achievements together. 
Both men dehned parallels in a novel way, as follows. 
Given a point P and a line m there will be some hnes 
through P that meet m and others that do not. Sepa- 
rating these two sets will be two hnes through P that 
do not quite meet m but which might come arbitrarily 
close, one to the right of P and one to the left. This situ- 
ation is illustrated in figure 3: the two hnes in question 
are n' and n". Notice that hnes on the diagram appear 
curved. This is because, in order to represent them on 
a flat, Euchdean page, it is necessary to distort them, 
unless the geometry is itself Euchdean, in which case 
one can put n' and n" together and make a single line 
that is inhnite in both directions. 

Given this new way of talking, it still makes sense to 
talk of dropping the perpendicular from P to the line m. 
The left and right parallels to m through P make equal 
angles with the perpendicular, called the angle ofpar- 
allelism. If the angle is a right angle, then the geometry 
is Euchdean. However, if it is less than a right angle, 
then the possibility arises of a new geometry. It turns 
out that the size of the angle depends on the length 
of the perpendicular from P to m. Neither Bolyai nor 
Lobachevskii expended any effort in trying to show that 
there was not some contradiction in taking the angle of 
parallelism to be less than a right angle. Instead, they 
simply made the assumption and expended a great deal 
of effort on determining the angle from the length of 
the perpendicular. 

They both showed that, given a family of hnes all par- 
allel (in the same direction) to a given line, and given 
a point on one of the hnes, there is a curve through 
that point that is perpendicular to each of the hnes 
(figure 4). 

In Euchdean geometry the curve defined in this way 
is the straight line that is at right angles to the fam- 
ily of parabel lines and that passes through the given 
point (figure 5). If, again in Euchdean geometry, one 
takes the family of ah hnes through a common point Q. 
and chooses another point P, then there will be a curve 
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Figure 4 A curve perpendicular to a famlly of parallels. 


Figure 5 A curve perpendicular to 
a famlly of Euclidean parallels. 



Figure 6 A curve perpendicular to 
a family of Euclidean lines through a point. 

through P that is perpendicular to all the lines: the circle 
with center Q that passes through P (figure 6). 

The curve defined by Bolyai and Lobachevskii has 
some of the properties of both these Euclidean con- 
structions: it is perpendicular to all the parallels, but it 
is curved and not straight. Bolyai called such a curve 
an L-curve. Lobachevskii more helpfully called it a 
horocycle, and the name has stuck. 

Their complicated arguments took both men into 
three-dimensional geometry. Here Lobachevskii’s argu- 


ments were somewhat clearer than Bolyai’s, and both 
men notably surpassed Gauss. If the figure defining a 
horocycle is rotated about one of the parallel lines, the 
lines become a family of parallel lines in three dimen- 
sions and the horocycle sweeps out a bowl-shaped sur- 
face, called the F-surface by Bolyai and the horosphere 
by Lobachevskii. Both men now showed that something 
remarkable happens. Planes through the horosphere 
cut it either in circles or in horocycles, and if a triangle 
is drawn on a horosphere whose sides are horocycles, 
then the angle sum of such a triangle is two right angle s. 
To put this another way, although the space that con- 
tains the horosphere is a three-dimensional version of 
case L, and is definitely not Euclidean, the geometry you 
ohtain when you restrict attention to the horosphere is 
(two-dimensional) Euclidean geometry! 

Bolyai and Lobachevskii also knew that one can draw 
spheres in their three-dimensional space, and they 
showed (though in this they were not original) that the 
formulas of spherical geometry hold independently of 
the parallel postulate. Lobachevskii now used an inge- 
nious construction involving his parallel lines to show 
that a triangle on a sphere determines and is deter- 
mined by a triangle in the plane, which also deter- 
mines and is determined by a triangle on the horo- 
sphere. This implies that the formulas of spherical 
geometry must determine formulas that apply to the 
triangles on the horosphere. On checking through the 
details, Lobachevskii, and in more or less the same way 
Bolyai, showed that the triangles on the horosphere are 
described by the formulas of hyperbolic trigonometry. 

The formulas for spherical geometry depend on the 
radius of the sphere in question. Similarly, the formu- 
las of hyperbolic trigonometry depend on a certain real 
parameter. However, this parameter does not have a 
similarly clear geometrical interpretation. That defect 
apart, the formulas have a number of reassuring prop- 
erties. In particular, they closely approximate the famil- 
iar formulas of plane geometry when the sides of the 
triangles are very small, which helps to explain how 
this geometry could have remained undetected for so 
long — it differs very little from Euclidean geometry in 
small regions of space. formulas for length and area 
can be developed in the new setting: they show that 
the area of a triangle is proportional to the amount by 
which the angle sum of the triangle falis short of two 
right angles. Lobachevskii, in particular, seems to have 
felt that the very faet that there were neat and plausible 
formulas of this kind was enough reason to accept the 
new geometry. In his opinion, all geometry was about 
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measurement, and theorems in geometry were unfail- 
ing connections between measurements expressed by 
formulas. His methods produced such formulas, and 
that, for him, was enough. 

Bolyai and Lobachevskii, having produced a descrip- 
tion of a novel three-dimensional geometry, raised the 
question of which geometry is true: is it Euclidean 
geometry or is it the new geometry for some value of 
the parameter that could presumably be determined 
experimentally? Bolyai left matters there, but Loba- 
chevskii explicitly showed that measurements of stellar 
parallax might resolve the question. Here he was unsuc- 
cessful: such experiments are notoriously delicate. 

By and large, the reaction to Bolyai and Lobachev- 
skii’s ideas during their lifetimes was one of neglect 
and hostility, and they died unaware of the success 
their discoveries would ultimately have. Bolyai and his 
father sent their work to Gauss, who replied in 1832 
that he could not praise the work “for to do so would be 
to praise myself,” adding, for extra measure, a simpler 
proof of one of Janos Bolyai’s opening results. He was, 
he said, nonetheless delighted that it was the son of his 
old friend who had taken precedence over him. Janos 
Bolyai was enraged, and refused to publish again, thus 
depriving himself of the opportunity to establish his 
priority over Gauss by publishing his work as an article 
in a mathematics journal. Oddly, there is no evidence 
that Gauss knew the details of the young Hungarian’s 
work in advance. More likely, he saw at once how the 
theory would go once he appreciated the opening of 
Bolyai’s account. 

A charitable interpretation of the surviving evidence 
would be that, by 1830, Gauss was convinced of the 
possibility that physical space might be described by 
non-Euclidean geometry, and he surely knew how to 
handle two-dimensional non-Euclidean geometry using 
hyperbolic trigonometry (although no detailed account 
of this survives from his hånd). But the three-dimen- 
sional theory was known first to Bolyai and Lobachev- 
skii, and may well not have been known to Gauss until 
he read their work. 

Lobachevskii fared little better than Bolyai. Elis ini- 
tial publication of 1829 was savaged in the press by 
Ostrogradskii, a much more established figure who 
was, moreover, in St Petersburg, whereas Lobachevskii 
was in provincial Kazan. His account in Journal fur die 
reine und angewandte Mathematik (otherwise known 
as Crelle’s Journal) suffered grievously from ref erring 
to results proved only in the Russian papers from 
which it had been adapted. His booklet of 1840 drew 


only one review, of more than usual stupidity. He did, 
however, send it to Gauss, who found it excellent and 
had Lobachevskii elected to the Gottingen Academy of 
Sciences. But Gauss’s enthusiasm stopped there, and 
Lobachevskii received no further support from him. 

Such a dreadful response to a major discovery invites 
analysis on several levels. It has to be said that the defi- 
nition of parallels upon which both men depended was, 
as it stood, inadequate, but their work was not crit- 
icized on that account. It was dismissed with scorn, 
as if it were self-evident that it was wrong: so wrong 
that it would be a waste of time finding the error it 
surely contained, so wrong that the right response was 
to heap ridicule upon its authors or simply to dismiss 
them without comment. This is a measure of the hold 
that Euclidean geometry still had on the minds of most 
people at the time. Even Copernicanism, for example, 
and the discoveries of Galileo drew a better reception 
from the experts. 

8 Acceptance of Non-Euclidean Geometry 

When Gauss died in 1855, an immense amount of un- 
published mathematics was found among his papers. 
Among it was evidence of his support for Bolyai and 
Lobachevskii, and his correspondence endorsing the 
possible validity of non-Euclidean geometry. As this 
was gradually published, the effect was to send peo- 
ple off to look for what Bolyai and Lobachevskii had 
written and to read it in a more positive light. 

Quite by chance, Gauss had also had a student at 
Gottingen who was capable of moving the matter deci- 
sively forward, even though the actual amount of con- 
tact between the two was probably quite slight. This 
was riemann [VI.49]. In 1854 he was called to defend 
his Habilitation thesis, the postdoctoral qualification 
that was a German mathematician’s license to teach 
in a university. As was the custom, he offered three 
titles and Gauss, who was his examiner, chose the one 
Riemann least expected: “On the hypotheses that lie at 
the foundation of geometry.” The paper, which was to 
be published only posthumously, in 1867, was nothing 
less than a complete reformulation of geometry. 

Riemann proposed that geometry was the study of 
what he called manifolds [1.3 §§6.9, 6.10]. These were 
“spaces” of points, together with a notion of distance 
that looked like Euclidean distance on small scales but 
which could be quite different at larger scales. This kind 
of geometry could be done in a variety of ways, he sug- 
gested, by means of the calculus. It could be carried 
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out for manifolds of any dimension, and in faet Rie- 
mann was even prepared to contemplate manifolds for 
which the dimension was infinite. 

A vital aspect of Riemann’s geometry, in which he 
followed the lead of Gauss, was that it was concerned 
only with those properties of the manifold that were 
intrinsic, rather than properties that depended on some 
embedding into a larger space. In particular, the dis- 
tance between two points x and y was defined to be 
the length of the shortest curve joining x and y that 
lay entirely within the surface. Such curves are called 
geodesics. (On a sphere, for example, the geodesics are 
ares of great circles.) 

Even two-dimensional manifolds could have differ- 
ent, intrinsic curvatures— indeed, a single two-dimen- 
sional manifold could have different curvatures in dif- 
ferent places — so Riemann’s definition led to infimtely 
many genuinely distinet geometries in each dimension. 
Furthermore, these geometries were hest defined with- 
out reference to a Euclidean space that contained them, 
so the hegemony of Euclidean geometry was broken 
once and for all. 

As the word “hypotheses” in the title of his thesis 
suggests, Riemann was not at all interested in the sorts 
of assumptions needed by Euclid. Nor was he mueh 
interested in the opposition between Euclidean and 
non-Euclidean geometry. He made a small reference 
at the start of his paper to the murkiness that lay at 
the heart of geometry, despite the efforts of Legendre, 
and toward the end he considered the three different 
geometries on two-dimensional manifolds for which 
the curvature is constant. He noted that one was spheri- 
cal geometry, another was Euclidean geometry, and the 
third was different again, and that in each case the angle 
sums of all triangles could be calculated as soon as one 
knew the sum of the angles of any one triangle. But 
he made no reference to Bolyai or Lobachevskii, merely 
noting that if the geometry of space was indeed a three- 
dimensional geometry of constant curvature, then to 
determine which geometry it was would involve tak- 
ing measurements in unfeasibly large regions of space. 
He did discuss gener alizations of Gauss’s curvature to 
spaces of arbitrary dimension, and he showed what 
metrics [III. 5 8] (that is, definitions of distance) there 
could be on spaces of constant curvature. The formuia 
he wrote down is very general, but as with Bolyai and 
Lobachevskii it depended on a certain real parameter— 
the curvature. When the curvature is negative, his defi- 
nition of distance gives a description of non-Euclidean 
geometry. 


Riemann died in 1866, and by the time his thesis was 
published an Italian mathematician, Eugenio Beltrami, 
had independently come to some of the same ideas. 
He was interested in what the possibilities were if one 
wished to map one surface to another. For example, one 
might ask, for some particular surface S, whether it is 
possible to find a map from S to the plane such that 
the geodesics in S are mapped to straight lines in the 
plane. He found that the answer was yes if and only if 
the space has constant curvature. There is, for example, 
a well-known map from the hemisphere to a plane with 
this property. Beltrami found a simple way of modify- 
ing the formuia so that now it defined a map from a 
surface of constant negative curvature onto the inte- 
rior of a disk, and he realized the significance of what 
he had done: his map defined a metric on the interior 
of the disk, and the resulting metric space obeyed the 
axioms for non-Euclidean geometry; therefore, those 
axioms would not lead to a contradiction. 

Some years earlier, Minding, in Germany, had found a 
surface, sometimes called the pseudosphere, that had 
constant negative curvature. It was obtained by rotat- 
ing a curve called the tractrix about its axis. This sur- 
face has the shape of a bugle, so it seemed rather less 
natural than the space of Euclidean plane geometry 
and unsuitable as a rival to it. The pseudosphere was 
independently rediscovered by liouville [VI.39] some 
years later, and Codazzi learned of it from that source 
and showed that triangles on this surface are described 
by the formulas of hyperbolic trigonometry. But none 
of these men saw the connection to non-Euclidean 
geometry— that was left to Beltrami. 

Beltrami realized that his disk depicted an infinite 
space of constant negative curvature, in which the 
geometry of Lobachevskii (he did not know at that time 
of Bolyai’s work) held true. He saw that it related to the 
pseudosphere in a way similar to the way that a plane 
relates to an infinite cylinder. After a period of some 
doubt, he learned of Riemann’s ideas and realized that 
his disk was in faet as good a depiction of the space 
of non-Euclidean geometry as any could be; there was 
no need to realize his geometry as that of a surface in 
Euclidean three-dimensional space. He thereupon pub- 
lished his essay, in 1868. This was the first time that 
sound foundations had been publiely given for the area 
of mathematics that could now be called non-Euclidean 
geometry. 

In 1871 the young klein [VI. 5 7] took up the sub- 
ject. He already knew that the English mathematician 
cayley [VI.46] had contrived a way of introducing 
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Euclidean metrical concepts into projective geom- 
etry [1.3 §6.7]. While studying at Berlin, Klein saw a 
way of generalizing Cayley’s idea and exhibiting Bel- 
trami’s non-Euclidean geometry as a special case of 
projective geometry. His idea met with the disapproval 
of weierstrass [VE44], the leading mathematician in 
Berlin, who objected that projective geometry was not 
a metrical geometry: therefore, he claimed, it could not 
generate metrical concepts. However, Klein persisted 
and in a series of three papers, in 1871, 1872, and 
1873, showed that all the known geometries could be 
regarded as subgeometries of projective geometry. His 
idea was to recast geometry as the study of a group 
acting on a space. Properties of figures (subsets of the 
space) that remain invariant under the action of the 
group are the geometric properties. So, for example, 
in a projective space of some dimension, the appropri- 
ate group for projective geometry is the group of all 
transformations that map lines to lines, and the sub- 
group that maps the interior of a given conic to itself 
may be regarded as the group of transformations of 
non-Euclidean geometry (see the box on p. 94). (For a 
fuller discussion of Klein’s approach to geometry, see 
(1.3 §6).) 

In the 1870s Klein’s message was spread by the first 
and third of these papers, which were published in 
the recently founded journal Mathematische Annalen. 
As Klein’s prestige grew, matters changed, and by the 
1890s, when he had the second of the papers repub- 
lished and translated into several languages, it was this, 
the Erlangen Program, that became well-known. It is 
named after the university where Klein became a pro- 
fessor, at the remarkably young age of twenty-three, 
but it was not his inaugural address. (That was about 
mathematics education.) For many years it was a singu- 
larly obscure publication, and it is unlikely that it had 
the effect on mathematics that some historians have 
come to suggest. 

9 Convincing Others 

Klein’s work directed attention away from the figures 
in geometry and toward the transformations that do 
not alter the figures in crucial respects. For example, 
in Euclidean geometry the important transformations 
are the familiar rotations and translations (and reflec- 
tions, if one chooses to allow them). These correspond 
to the motions of rigid bodies that contemporary psy- 
chologists saw as part of the way in which individu- 
als learn the geometry of the space around them. But 


this theory was philosophically contentious, especially 
when it could be extended to another metrical geom- 
etry, non-Euclidean geometry. Klein prudently entitled 
his main papers “On the so-called non-Euclidean geom- 
etry,” to keep hostile philosophers at bay (in particular 
Lotze, who was the well-established Kantian philoso- 
pher at Gottingen). But with these papers and the previ- 
ous work of Beltrami the case for non-Euclidean geom- 
etry was made, and almost all mathematicians were 
persuaded. They believed, that is, that alongside Euclid- 
ean geometry there now stood an equally valid mathe- 
matical system called non-Euclidean geometry. As for 
which one of these was true of space, it seemed so 
clear that Euclidean geometry was the sensible choice 
that there appears to have been little or no discus- 
sion. Lipschitz showed that it was possible to do all 
of mechanics in the new setting, and there the matter 
rested, a hypothetical case of some charm but no more. 
Helmholtz, the leading physicist of his day, became 
interested — he had known Riemann personally — and 
gave an account of what space would have to be if it 
was learned about through the free mobility of bod- 
ies. Elis first account was deeply flawed, because he 
was unaware of non-Euclidean geometry, but when Bel- 
trami pointed this out to him he reworked it (in 1870). 
The reworked version also suffered from mathematical 
deficiencies, which were pointed out somewhat later by 
lie [VI.53], but he had more immediate trouble from 
philosophers. 

Their question was, “What sort of knowledge is this 
theory of non-Euclidean geometry?” Kantian philoso- 
phy was coming back into fashion, and in Kant's view 
knowledge of space was a fundamental pure a priori 
intuition, rather than a matter to be determined by 
experiment: without this intuition it would be impos- 
sible to have any knowledge of space at all. Faced with 
a rival theory, non-Euclidean geometry, neo-Kantian 
philosophers had a problem. They could agree that the 
mathematicians had produced a new and prolonged 
logical exercise, but could it be knowledge of the world? 
Surely the world could not have two kinds of geom- 
etry? Helmholtz hit back, arguing that knowledge of 
Euclidean geometry and non-Euclidean geometry would 
be acquired in the same way— through experience— but 
these empiricist overtones were unacceptable to the 
philosophers, and non-Euclidean geometry remained a 
problem for them until the early years of the twentieth 
century. 

Mathematicians could not in faet have given a com- 
pletely rigorous defense of what was becoming the 
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Cross-ratios and distances in conics. A projec- 
tive transformation of the plane sends four distinet 
points on a line, A, B, C, D, to four distinet collinear 
points, A', B', C', D', in such a way that the quantity 
AB CD 
AD CB 

is preserved: that is, 

AB CD = A'B' C'D' 

AD CB “ A'D' C'B' ' 

This quantity is called the cross-ratio of the four 
points A, B, C, D, and is written CR(A, B, C, D) . 

In 1871, Klein described non-Euchdean geometry 
as the geometry of points inside a fixed conic, K, 
where the transformations allowed are the projec- 


tive transformations that map K to itself and its 
interior to its interior (see figure 7). To define the 
distance between two points P and Q inside K, Klein 
noted that if the line PQ is extended to meet K at A 
and D, then the cross-ratio CR(A, P, D, Q) does not 
change if one applies a projective transformation: 
that is, it is a projective invariant. Moreover, if R is 
a third point on the line PQ and the points he in the 
order P, Q, R, then CR(A,P,D,Q) CR(A,Q,D,R) = 
CR(A, P, D,R). Accordingly, he defined the distance 
between P and Q as d( PQ) = -|logCR(A, P,D, Q) 
(the factor of - \ is introduced to facilitate the later 
introduction of trigonometry). With this definition, 
distance is additive along a line: d(PQ) + d(QR) = 
d( PR). 


accepted position, but as the news spread that there 
were two possihle descriptions of space, and that one 
could therefore no longer be certain that Euclidean 
geometry was correct, the educated public took up the 
question: what was the geometry of space? Among the 
first to grasp the problem in this new formulation was 
poincaré [VT.61]. He came to mathematical farne in the 
early 1880s with a remarkable series of essays in which 
he reformulated Beltrami’s disk model so as to make 
it conformal: that is, so that angles in non-Euclidean 
geometry were represented by the same angles in the 
model He then used his new disk model to connect 
complex funetion theory, the theory of linear differ- 
ential equations, riemann surface [III.81] theory, and 
non-Euclidean geometry to produce a rich new body 
of ideas. Then, in 1891, he pointed out that the disk 
model permitted one to show that any contradiction 
in non-Euclidean geometry would yield a contradiction 
in Euclidean geometry as well, and vice versa. There- 
fore, Euclidean geometry was consistent if and only if 
non-Euclidean geometry was consistent. A curious con- 
sequence of this was that if anybody had managed to 
derive the parallel postulate from the core of Euclidean 
geometry, then they would have inadvertently proved 
that Euclidean geometry was inconsistent! 

One obvious way to try to decide which geometry 
described the actual universe was to appeal to physics. 
But Poincaré was not convinced by this. He argued in 
another paper (1902) that experience was open to many 
interpretations and there was no logical way of decid- 
ing what belonged to mathematics and what to physics. 
Imagine, for example, an elaborate set of measure- 
ments of angle sums of figures, perhaps on an astro- 



Figure 7 Three points, P, Q, and R, on a non-Euclidean 
straight line in Klein's projective model of non-Euclidean 
geometry. 

nomical scale. Something would have to be taken to be 
straight, perhaps the paths of rays of light. Suppose, 
finally, that the conclusion is that the angle sum of a tri- 
angle is indeed less than two right angles by an amount 
proportional to the area of the triangle. Poincaré said 
that there were two possihle conclusions: light rays are 
straight and the geometry of space is non-Euclidean; 
or light rays are somehow curved, and space is Euclid- 
ean. Moreover, he continued, there was no logical way 
to choose between these possibilities. All one could do 
was to make a convention and abide by it, and the sen- 
sible convention was to choose the simpler geometry: 
Euclidean geometry. 

This philosophical position was to have a long life in 
the twentieth century under the name of convention- 
alism, but it was far from accepted in Poincaré's life- 
time. A prominent critic of conventionalism was the 
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Italian Federigo Enriques, who, like Poincaré, was both a 
powerful mathematician and a writer of popular essays 
on issues in science and philosophy. Fie argued that 
one could decide whether a property was geometri- 
cal or physical by seeing whether we had any control 
over it. We cannot vary the law of gravity, but we can 
change the force of gravity at a point by moving mat- 
ter around. Poincaré had compared his disk model to 
a metal disk that was hot in the center and got cooler 
as one moved outwards. He had shown that a simple 
law of cooling produced figures identical to those of 
non-Euclidean geometry. Enriques replied that heat was 
likewise something we can vary. A property such as 
Poincaré invoked, which was truly beyond our control, 
was not physical but geometric. 

10 Looking Ahead 

In the end, the question was not resolved in its 
own terms. Two developments moved mathemati- 
cians beyond the simple dichotomy posed by Poincaré. 
Starting in 1899, hilbert [VI.63] began an extensive 
rewriting of geometry along axiomatic lines, which 
eclipsed earlier ideas of some Italian mathematicians 
and opened the way to axiomatic studies of many 
kinds. Hilbert’s work captured very well the idea that 
if mathematics is sound, it is sound because of the 
nature of its reasoning, and led to profound investi- 
gations in mathematical logic. And in 1915 Einstein 
proposed his general theory of relativity, which is in 
large part a geometric theory of gravity. Confidence 
in mathematics was restored; our sense of geometry 
was much enlarged, and our insights into the rela- 
tionships between geometry and space became consid- 
erably more sophisticated. Einstein made full use of 
contemporary ideas about geometry, and his achieve- 
ment would have been unthinkable without Riemann’s 
work. He described gravity as a kind of curvature in the 
four-dimensional manifold of spacetime (see general 
RELATIVITY AND THE EINSTEIN EQUATIONS [IV. 17]). His 
work led to new ways of thinking about the large-scale 
structure of the universe and its ultimate fate, and to 
questions that remain unanswered to this day. 
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II. 3 The Development of 
Abstract Algebra 

Karen Hunger Parshall 


1 Introduction 

What is algebra? To the high-school student encoun- 
tering it for the first time, algebra is an unfamiliar 
abstract language of x’s and y’ s, a’s and b’s, together 
with rules for manipulating them. These letters, some 
of them variables and some constants, can be used 
for many purposes. For example, one can use them to 
express straight lines as equations of the form y = 
ax + b, which can be graphed and thereby visualized in 
the Cartesian plane. Furthermore, by manipulating and 
interpreting these equations, it is possible to determine 
such things as what a given line's root is (if it has one)— 
that is, where it crosses the x-axis — and what its slope 
is — that is, how steep or flat it appears in the plane 
relative to the axis system. There are also techniques 
for solving simultaneous equations, or equivalently for 
determining when and where two lines intersect (or 
demonstrating that they are parallel). 

Just when there already seem to be a lot of tech- 
niques and abstract manipulations involved in deal- 
ing with lines, the ante is upped. More complicated 
curves like quadratics, y = ax 2 + bx + c, and even 
cubics, y = ax 3 + bx 2 + cx + d, and quartics, y = 
ax 4 + bx 3 + cx 2 + dx + e, enter the picture, but the 
same sort of notation and rules apply, and similar sorts 
of questions are asked. Where are the roots of a given 
curve? Given two curves, where do they intersect? 

Suppose now that the same high-school student, hav- 
ing mastered this sort of algebra, goes on to university 
and attends an algebra course there. Essentially gone 
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are the by now familiar x’s, y’ s, a’s, and b’s; essen- 
tially gone are the nice graphs that provide a way to 
picture what is going on. The university course reflects 
some brave new world in which the algebra has some- 
how become “modern.” This modern algebra involves 
abstract structures— groups [1.3 §2.1], rings [III.83 §1], 
fields [1.3 §2.2], and other so-called objects— each one 
defined in terms of a relatively small number of axioms 
and built up of substructures like subgroups, ideals, 
and subfields. There is a lot of moving around between 
these objects, too, via maps like group homomor- 
phisms and ring automorphisms [1.3 §4.1]. One objec- 
tive of this new type of algebra is to understand the 
underlying structure of the objects and, in doing so, to 
build entire theories of groups or rings or fields. These 
abstract theories may then be applied in diverse set- 
tings where the basic axioms are satisfied but where it 
may not be at all apparent a priori that a group or a ring 
or a held may be lurking. This, in faet, is one of modern 
algebra’s great strengths: once we have proved a gen- 
eral faet about an algebraic structure, there is no need 
to prove that faet separately each time we come across 
an instance of that structure. This abstract approach 
allows us to recognize that contexts that may look quite 
different are in faet importantly similar. 

How is it that two endeavors— the high-school analy- 
sis of polynomial equations and the modern algebra of 
the research mathematician— so seemingly different in 
their ohjectives, in their tools, and in their philosoph- 
ical oudooks are both called “algebra”? Are they even 
related? In faet, they are, but the story of how they are 
is long and complicated. 

2 Algebra before There Was Algebra: 

From Old Babylon to the Hellenistic Era 

Solutions of what would today be recognized as first- 
and second-degree polynomial equations may be found 
in Old Babylonian euneiform texts that date to the sec- 
ond millennium b.c.e. However, these problems were 
neither written in a notation that would be recogniz- 
able to our modern-day high-school student nor solved 
using the kinds of general techniques so characteris- 
tic of the high-school algebra classroom. Rather, par- 
ticular problems were posed, and particular solutions 
obtained, from a series of recipe-like steps. No general 
theoretical justification was given, and the problems 
were largely cast geometrically, in terms of measurable 
line segments and surfaces of particular areas. Con- 
sider, for example, this problem, translated and tran- 
scribed from a clay tablet held in the British Museum 
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Figure 1 The sixth proposition from Euclid’s Book II. 

(catalogued as BM 13901, problem 1) that dates from 
between 1800 and 1600 b.c.e.: 

The surface of my confrontation I have accumulated: 
45' is it. 1, the projection, you posit. The moiety of 1 
you break, 30' and 30' you make hold. 15' to 45' you 
append: by 1, 1 is equalside. 30' which you have made 
hold in the inside you tear out: 30' the confrontation. 

This may be translated into modern notation as the 
equation x 2 + Ix = |, where it is important to notice 
that the Babylonian number system is base 60, so 45' 
denotes || = |. The text then lays out the following 
algorithm for solving the problem: take 1, the coeffi- 
cient of the linear term, and halve it to get \. Square \ 
to get Add \ to |, the constant term, to get 1. This is 
the square of 1 . Subtract from this the 1 which you mul- 
tiplied hy to get \ , the side of the square. The modern 
reader can easily see that this algorithm is equivalent to 
what is now called the quadratic formula, but the Baby- 
lonian tablet presents it in the context of a particular 
problem and repeats it in the contexts of other partic- 
ular problems. There are no equations in the modern 
sense; the Babylonian writer is literally effeeting a con- 
struction of plane figures. Similar problems and simi- 
lar algorithmic solutions can also be found in ancient 
Egyptian texts such as the Rhind papyrus, believed to 
have been copied in 1650 b.c.e. from a text that was 
about a century and a half older. 

The problem-oriented, untheoretical approach to 
mathematics characteristic of texts from this early 
period contrasts sharply with the axiomatic and deduc- 
tive approach that euclid [VI. 2] introduced into mathe- 
matics in around 300 b.c.e. in his magisterial, geometri- 
cal treatise, the Elements. (See geometry [II.2] for a fur- 
ther discussion of this work.) There, budding on explicit 
definitions and a small number of axioms or self- 
evident truths, Euclid proceeded to deduce known— 
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and almost certainly some hitherto unknown— results 
within a strictly geometrical context. Geometry done 
in this axiomatic context defined Euclid’s standard of 
rigor. But what does this quintessentially geometrical 
text have to do with algebra? Consider the sixth propo- 
sition in Euclid’s Book II, ostensibly a book on plane 
figures, and in particular quadrilaterals: 

If a straight line be bisected and a straight line be 
added to it in a straight line, the rectangle containedby 
the whole with the added straight line and the added 
straight line together with the square on the half is 
equal to the square on the straight line made up of 
the half and the added straight line. 

While clearly a geometrical construction, it equally 
clearly describes two constructions— one a rectangle 
and one a square— that have equal areas. It therefore 
describes something that we should be able to write as 
an equation. Figure 1 gives the picture corresponding 
to Euclid’s construction: he proves that the area of rect- 
angle ADMK equals the sum of rectangles CDML and 
HMFG. To do this, he adds the square on CB— namely, 
square LHGE — to CDML and HMFG. This gives square 
CDFE. It is not hard to see that this is equivalent to the 
high-school procedure of “completing the square” and 
to the algebraic equation (2a + b)b + a 2 = (a + b) 2 , 
which we obtain by setting CB = a and BD = b. Equiv- 
alent, yes, but for Euclid this is a specific geometrical 
construction and a particular geometrical equivalence. 
For this reason, he could not deal with anything but 
positive real quantities, since the sides of a geometrical 
figure could only be measured in those terms. Nega- 
tive quantities did not and could not enter into Euclid’s 
fundamentally geometrical mathematical world. Never- 
theless, in the historical literature, Euclid's Book II has 
often been described as dealing with “geometrical alge- 
bra,” and, because of our easy translation of the book’s 
propositions into the language of algebra, it has been 
argued, albeit ahistorically, that Euclid had algebra but 
simply presented it geometrically. 

Although Euclid's geometrical standard of rigor came 
to be regarded as a pinnacle of mathematical achieve- 
ment, it was in many ways not typical of the math- 
ematics of classical Greek antiquity, a mathematics 
that focused less on systematization and more on the 
elever and individualistic solution of particular prob- 
lems. There is perhaps no better exemplar of this than 
archimedes [VI.3], held by many to have been one 
of the three or four greatest mathematicians of all 
time. Still, Archimedes, like Euclid, posed and solved 


particular problems geometrically. As long as geom- 
etry defined the standard of rigor, not only negative 
numbers but also what we would recognize as poly- 
nomial equations of degree higher than three effee- 
tively feli outside the sphere of possible mathemati- 
cal discussion. (As in the example from Euclid above, 
quadratic polynomials result from the geometrical pro- 
cess of completing the square; cubics could conceiv- 
ably result from the geometrical process of completing 
the cube; but quartics and higher-degree polynomials 
could not be constructed in this way in familiar, three- 
dimensional space.) However, there was another mathe- 
matician of great importance to the present story, Dio- 
phantus of Alexandria (who was active in the middle 
of the third century c.e.). Like Archimedes, he posed 
particular problems, but he solved them in an algorith- 
mic style mueh more reminiscent of the Old Babylo- 
nian texts than of Archimedes’ geometrical construc- 
tions, and as a result he was able to begin to exceed the 
bounds of geometry. 

In his text Arithmetica, Diophantus put forward gen- 
eral, indeterminate problems, which he then restricted 
by specifying that the solutions should have partic- 
ular forms, before providing specific solutions. He 
expressed these problems in a very different way from 
the purely rhetorical style that held sway for centuries 
after him. His notation was more algebraic and was ulti- 
mately to prove suggestive to sixteenth-century math- 
ematicians (see below). In particular, he used special 
abbreviations that allowed him to deal with the first six 
positive and negative powers of the unknown as well 
as with the unknown to the zeroth power. Thus, what- 
ever his mathematics was, it was not the “geometrical 
algebra” of Euclid and Archimedes. 

Consider, for example, this problem from Book II 
of the Arithmetica : “To find three numbers such that 
the square of any one of them minus the next fol- 
lowing gives a square.” In terms of modern notation, 
he began by restricting his attention to solutions of 
the form (x + 1, 2x + 1, 4x + 1). It is easy to see that 
(x+l) 2 -(2x+l) = x 2 and (2x+1) 2 -(4x+1) = 4x 2 ,so 
two of the conditions of the problem are immediately 
satisfied, but he needed (4x + l) 2 -(x + l) = 16x 2 + 7x 
to be a square as well. Arbitrarily setting 16x 2 + 7x = 
2 5x 2 , Diophantus then determined that x = g gave him 
what he needed, so a solution was ^ ^ ^ , and he was 
done. He provided no geometrical justification because 
in his view none was needed; a single numerical solu- 
tion was all he required. He did not set up what we 
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would recognize as a more general set of equations and 
try to find all possible solutions. 

Diophantus, who lived more tlian four centuries after 
Archimedes’ death, was doing neither geometry nor 
algebra in our modern sense, yet the kinds of problems 
and the sorts of solutions he obtained for them were 
very different from those found in the works of either 
Euclid or Archimedes. The extern to which Diophantus 
created a wholly new approach, rather than drawing on 
an Alexandrian tradition of what might be called “algo- 
rithmic algebraic,” as opposed to “geometric algebraic,” 
scholarship is unknown. It is clear that by the time Dio- 
phantus’s ideas were introduced into the Latin West in 
the sixteenth century, they suggested new possibilities 
to mathematicians long conditioned to the authority of 
geometry. 

3 Algebra before There Was Algebra: 

The Medieval Islamic World 

The transmission of mathematical ideas was, however, 
a complex process. After the fail of the Roman Empire 
and the subsequent decline of learning in the West, 
both the Euclidean and the Diophantine traditions ulti- 
mately made their way into the medieval Islamic world. 
There they were not only preserved— thanks to the 
active translation initiatives of Islamic scholars— but 
also studied and extended. 

al-khwårizmI [VI. 5] was a scholar at the royally 
funded House of Wisdom in Baghdad. He linked the 
kinds of geometrical arguments Euclid had presented 
in Book II of his Elements with the indigenous problem- 
solving algorithms that dated back to Old Babylonian 
times. In particular, he wrote a book on practical math- 
ematics, entitled al-Kitåb al-mukhtasar fi hisåb al-jabr 
wa’Tmuqåbala (“The compendious book on calcula- 
tion by completion and balancing”), beginning it with 
a theoretical discussion of what we would now recog- 
nize as polynomial equations of the first and second 
degrees. (The latinization of the word “al-jabr” or “com- 
pletion” in his title gave us our modern term “alge- 
bra.”) Because he employed neither negative numbers 
nor zero coefficients, al-Khwårizmi provided a system- 
atization in terms of six separate kinds of examples 
where we would need just one, namely ax 2 + bx+c = 0. 
He considered, for example, the case when “a square 
and 10 roots are equal to 39 units,” and his algorith- 
mic solution in terms of multiplications, additions, and 
subtractions was in precisely the same form as the 
above solution from tablet BM 13901. This, however, 


was not enough for al-Khwårizmi. “It is necessary,” he 
said, “that we should demonstrate geometrically the 
truth of the same problems which we have explained 
in numbers,” and he proceeded to do this by “complet- 
ing the square” in geometrical terms reminiscent of, 
but not as formal as, those Euclid used in Book II. (Abu 
Kamil (ca. 850-930), an Egyptian Islamic mathemati- 
cian of the generation after al-Khwårizmi, introduced a 
higher level of Euclidean formality into the geometric- 
algorithmic setting.) This juxtaposition made explicit 
how the relationships between geometrical areas and 
lines could be interpreted in terms of numerical multi- 
plications, additions, and subtractions, a key step that 
would ultimately suggest a move away from the geo- 
metrical solution of particular problems and toward an 
algebraic solution of general types of equations. 

Another step along this path was taken by the math- 
ematician and poet Omar Khayyam (ca. 1050-1130) in 
a book he entitled Al-jabr after al-Khwårizmi’s work. 
Here he proceeded to systematize and solve what we 
would recognize, in the absence of both negative num- 
bers and zero coefficients, as the cases of the cubic 
equation. Following al-Khwårizmi, Khayyam provided 
geometrical justifications, yet his work, even more than 
that of his predecessor, may be seen as doser to a 
general problem-solving technique for specific cases of 
equations, that is, doser to the notion of algebra. 

The Persian mathematician al-Karaji (who flourished 
in the early eleventh century) also knew well and 
appreciated the geometrical tradition stemming from 
Euclid’s Elements. However, like Abu-Kåmil, he was 
aware of the Diophantine tradition too, and synthe- 
sized in more general terms some of the procedures 
Diophantus had laid out in the context of specific exam- 
ples in the Arithmetica. Although Diophantus’s ideas 
and style were known to these and other medieval 
Islamic mathematicians, they would remain unknown 
in the Latin West until their rediscovery and trans- 
lation in the sixteenth century. Equally unknown in 
the Latin West were the accomplishments of Indian 
mathematicians, who had succeeded in solving some 
quadratic equations algorithmically by the beginning 
of the eighth century and who, like Bragmagupta four 
hundred years later, had techniques for finding inte- 
ger solutions to particular examples of what are today 
called Pell's equations, namely, equations of the form 
ax 2 + b = y 2 , where a and b are integers and a is not 
a square. 
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4 Algebra before There 
Was Algebra: The Latin West 

Concurrent with the rise of Islam in the East, the 
Latin West underwent a gradual cultural and polit- 
ical stabilization in the centuries following the fail 
of the Roman Empire. By the thirteenth century, this 
relative stability had re sulte d in the firm entrench- 
ment of the Catholic Church as well as the establish- 
ment both of universities and of an active economy. 
Moreover, the Islande conquest of most of the Iberian 
peninsula in the eighth century and the subsequent 
establishment there of an Islande court, library, and 
research faedity similar to the Elouse of Wisdom in 
Baghdad brought the fruits of medieval Islande schol- 
arship to western Europe’s doorstep. However, as Islam 
found its position on the Iberian peninsula increasingly 
compromised in the twelfth and thirteenth centuries, 
this Islande learning, as well as some of the ancient 
Greek scholarship that the medieval Islande scholars 
had preserved in Latin translation, began to filter into 
medieval Europe. In particular, fibonacci [VI.6], son of 
an influential administrator within the Pisan city State, 
encountered al-Khwarizrm’s text and recognized not 
only the impact that the Arabic number system detaded 
there could have on accounting and commerce (Roman 
numerals and their eumbersome rules for manipula- 
tion were still widely in use) but also the importance 
of al-Khwarizmi’s theoretical discussion, with its wed- 
ding of geometrical proof and the algorithmic solution 
of what we can interpret as first- and second-degree 
equa tions. In his 1202 book Liber abbaci, Fibonacci 
presented al-Khwarizmi’s work almost verbatim, and 
extolled all of these virtues, thus effeetively introducing 
this knowledge and approach into the Latin West. 

Fibonacci’s presentation, especially of the practi- 
cal aspects of al-Khwårizmi’s text, soon became well- 
known in Europe. So-called abacus schools (named 
after Fibonacci’s text and not after the Chinese calculat- 
ing instrument) sprang up all over the Italian peninsula, 
particularly in the fourteenth and fifteenth centuries, 
for the training of accountants and bookkeepers in an 
increasingly mercantilistic Western world. The teach- 
ers in these schools, the “maestri d’abaco,” budt on 
and extended the algorithms they found in Fibonacci’s 
text. Another tradition, the Cossist tradition— after the 
German word “Coss” connoting algebra, that is, “Kun- 
strechnung” or “artful calculation”— developed simul- 
taneously in the Germanic regions of Europe and aimed 
to introduce algebra into the mainstream there. 


In 1494 the Italian Luca Pacioli published (by now 
this is the operative word: Pacioli’s text is one of the 
earliest printed mathematical texts) a compendium of 
all known mathematics. By this time, the geometrical 
justifications that al-Khwarizmi and Fibonacci had pre- 
sented had long since fallen from the mathematical ver- 
nacular. By reintroducing them in his book, the Summa, 
Pacioli brought them back to the mathematical fore. 
Not knowing of Khayyam’s work, he asserted that solu- 
tions had been discovered only in the six cases treated 
by both al-Khwårizmi and Fibonacci, even though there 
had been abortive attempts to solve the cubic and even 
though he held out the hope that it could ultimately be 
solved. 

Pacioli had highlighted a key unsolved problem: 
could algorithmic solutions be determined for the var- 
ious cases of the cubic? And, if so, could these be justi- 
fied geometrically with proofs similar in spirit to those 
found in the texts of al-Khwårizmi and Fibonacci? 

Among several sixteenth-century Italian mathemati- 
cians who eventually managed to answer the first ques- 
tion in the affirmative was cardano [VI.7]. In his Ars 
magna, or The Great Art, of 1545, he presented algo- 
rithms with geometric justifications for the various 
cases of the cubic, effeetively completing the cube 
where al-Khwarizmi and Fibonacci had completed the 
square. He also presented algorithms that had been dis- 
covered by his student Ludovico Ferrari (1522-65) for 
solving the cases of the quartic. These intrigued him, 
because, unlike the algorithms for the cubic, they were 
not justified geometrically. As he put it in his book, “all 
those matters up to and including the cubic are fully 
demonstrated, but the others which we will add, either 
by necessity or out of curiosity, we do not go beyond 
barely setting out.” An algebra was breaking out of the 
geometrical shell in which it had been encased. 

5 Algebra Is Born 

This process was accelerated by the rediscovery and 
subsequent translation into Latin of Diophantus’s 
Arithmetica in the 1 560s, with its abbreviated presenta- 
tional style and ungeometrical approach. Algebra, as a 
general problem-solving technique, applicable to ques- 
tions in geometry, number theory, and other mathe- 
matical settings, was established in Raphael bombelli’s 
[VI.8] Algebra of 1572 and, more importantly, in 
viéte's [VI.9] In artem analyticem isagoge, or Introduc- 
tion to the Analytic Art, of 1591. The aim of the latter 
was, in Viéte’s words, “to leave no problem unsolved,” 
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and to this end he developed a true notation— using 
vowels to denote variables and consonants to denote 
coefficients— as well as methods for solving equations 
in one unknown. He called his techniques “specious 
logis tics.” 

Dimensionality — in the form of his so-called law of 
homogeneity — was, however, still an issue for Viéte. 
As he put it, “[o]nly homogeneous magnitudes are 
to be compared to one another.” The problem was 
that he distinguished two types of magnitudes: “lad- 
der magnitudes”— that is, variables (A side) (or x in our 
modern notation), (A square) (or x 2 ), (A cube) (or x 3 ), 
etc.; and “compared magnitudes”— that is, coefficients 
(B length) of dimension one, ( B plane) of dimension 
two, ( B solid) of dimension three, etc. In the light of 
his law of homogeneity, then, Viéte could legitimately 
perform the operation (A cube) + (B plane) (A side) (or 
x 3 + bx in our notation), since the dimension of (A cube) 
is three, as is that of the product of the two-dimensional 
coefficient (B plane) and the one-dimensional vari- 
able (A side), but he could not legally add the three- 
dimensional variable (A cube) to the two-dimensional 
product of the one-dimensional coefficient (B length) 
and the one-dimensional variable (A side) (or, again, 
x 3 + bx in our notation). Be this as it may, his “ana- 
lytic art” still allowed him to add, subtract, multiply, 
and divide letters as opposed to specific numbers, and 
those letters, as long as they satisfied the law of homo- 
geneity, could be raised to the second, third, fourth, 
or, indeed, any power. He had a rudimentary algebra, 
although he failed to apply it to curves. 

The first mathematicians to do that were fermat 
[VI. 12] and descartes [VI. 11] in their independent 
development of the analytic geometry so familiar to 
the high-school algebra student of today. Fermat, and 
others like Thomas Harriot (ca. 1560-1621) in England, 
were influenced in their approaches by Viéte, while Des- 
cartes not only introduced our present-day notational 
convention of representing variables by x’s and y’s 
and constants by a’ s, b’ s, and c’ s but also began the 
arithmetization of algebra. He introduced a unit that 
allowed him to interpret all geometrical magnitudes 
as line segments, whether they were x’s, x 2 ’s, x 3 ’s, 
x 4 ’s, or any higher power of x, thereby removing con- 
cerns about homogeneity. Fermat’s main work in this 
direction was a 1636 manuscript written in Latin, enti- 
tled “Introduction to plane and solid loci” and circu- 
lated among the early seventeenth-century mathemat- 
ical cognoscenti; Descartes’s was the Geometry, writ- 
ten in French as one of three appendices to his philo- 


sophical tract, Discourse on Method, published in 1637. 
Both were regarded as establishing the identification of 
geometrical curves with equations in two unknowns, 
or in other words as establishing analytic geometry 
and thereby introducing algebraic techniques into the 
solution of what had previously been considered geo- 
metrical problems. In Fermat’s case, the curves were 
lines or conic sections— quadratic expressions in x 
and y; Descartes did this too, but he also considered 
equations more generally, tackling questions about the 
roots of polynomial equations that were connected 
with transforming and reducing the polynomials. 

In particular, although he gave no proof or even gen- 
eral statement of it, Descartes had a rudimentary ver- 
sion of what we would now call the fundamental 
theorem of algebra [V.15], the result that a poly- 
nomial equation x™ + a n _ix re_1 + ■ ■ ■ + a\x + ao of 
degree n has precisely n roots over the held C of com- 
plex numbers. For example, while he held that a given 
polynomial of degree n could be decomposed into n 
linear factors, he also recognized that the cubic x 3 - 
6x 2 + 13x-10 = 0 has three roots: the real root 2 and 
two complex roots. In his further exploration of these 
issues, moreover, he developed algebraic techniques, 
involving suitable transformations, for analyzing poly- 
nomial equations of the fifth and sixth degrees. liber- 
ated from homogeneity concerns, Descartes was thus 
able to use his algebraic techniques freely to explore 
territory where the geometrically bound Cardano had 
clearly been reluctant to venture. newton [VI.14] took 
the liberation of algebra from geometrical concerns a 
step further in his Arithmetica universalis (or Univer- 
sal Arithmetic) of 1707, arguing for the complete arith- 
metization of algebra, that is, for modeling algebra and 
algebraic operations on the real numbers and the usual 
operations of arithmetic. 

Descartes’s Geometry highlighted at least two prob- 
lems for further algebraic exploration: the fundamen- 
tal theorem of algebra and the solution of polyno- 
mial equations of degree greater than four. Although 
eighteenth-century mathematicians like d’alembert 
[VI.20] and euler [VI.19] attempted proofs of the fun- 
damental theorem of algebra, the first person to prove 
it rigorously was gauss [VI. 26], who gave four distinet 
proofs over the course of his career. His first, an alge- 
braic geometrical proof, appeared in his doctoral dis- 
sertation of 1799, while a second, fundamentally dif- 
ferent proof was published in 1816, which in modern 
terminology essentially involved constructing the poly- 
nomial’s splitting held. While the fundamental theorem 



II.3. The Development of Abstract Algebra 


101 


of algebra established how many roots a given poly- 
nomial equation has, it did not provide insight into 
exactly what those roots were or how precisely to find 
them. That problem and its many mathematical reper- 
cussions exercised a number of mathematicians in the 
late eighteenth and nineteenth centuries and formed 
one of the strands of the mathematical thread that 
became modern algebra in the early twentieth century. 
Another emerged from attempts to understand the gen- 
eral behavior of systems of (one or more) polynomials 
in n unknowns, and yet another grew from efforts to 
approach number-theoretic questions algebraically. 

6 The Search for the Roots 
of Algebraic Equations 

The problem of Ånding roots of polynomials pro- 
vides a direct link from the algebra of the high-school 
classroom to that of the modern research mathemati- 
cian. Today’s high-school student dutifully employs the 
quadratic formula to calculate the roots of second- 
degree polynomials. To derive this formula, one trans- 
forms the given polynomial into one that can be solved 
more easily. By more complicated manipulations of 
cubics and quartics, Cardano and Ferrari obtained for- 
mulas for the roots of those as well. It is natural to ask 
whether the same can be done for higher-degree poly- 
nomials. More precisely, are there formulas that involve 
just the usual operations of arithmetic— addition, sub- 
traction, multiplication, and division — together with 
the extraction of roots? When there is such a formula, 
one says that the equation is solvable by radicals. 

Although many eighteenth-century mathematicians 
(among them Euler, Alexandre-Théophile Vander- 
monde (1735-96), waring [VI.21], and Étienne Bézout 
(1730-83)) contributed to the effort to decide whether 
higher-order polynomial equations are solvable by rad- 
icals, it was not until the years from roughly 1770 to 
1830 that there were signiAcant breakthroughs, partic- 
ularly in the work of lagrange [VI.22], abel [VI.33], 
and Gauss. 

In a lengthy set of “RéAections sur la résolution 
algébrique des équations” (ReAections on the algebraic 
resolution of equations) published in 1771, Lagrange 
tried to determine principles underlying the resolution 
of algebraic equations in general by analyzing in detail 
the speciAc cases of the cubic and the quartic. Budd- 
ing on the work of Cardano, Lagrange showed that a 
cubic of the form x 3 + ax 2 + bx + c = 0 could always 
be transformed into a cubic with no quadratic term 


x 3 + px + q = 0 and that the roots of this could be 
written as x = u + v, where v? and v 3 are the roots 
of a certain quadratic polynomial equation. Lagrange 
was then able to show that if xi, % 2 , X3 are the three 
roots of the cubic, the intermediate functions u and v 
could actually be written as u = |(%i + <XX2 + a 2 x-i ) 
and v = 3 (X] + a 2 X2 + ax 3 ) , for a a primitive cube root 
of unity. That is, u and v could be written as rational 
expressions or resolvents in xi, X2, X3. Conversely, 
starting with a linear expression y = Ax 1 + Bx 2 + Cx 3 
in the roots Xi, X2, X3 and then permuting the roots 
in all possible ways yielded six expressions each of 
which was a root of a particular sixth-degree polyno- 
mial equation. An analysis of the latter equation (which 
involved the exploitation of properties of symmetric 
polynomials) yielded the same expressions for u and v 
in terms of %i, X2, X3 and the cube root of unity a. As 
Lagrange showed, this kind of two-pronged analysis— 
involving intermediate expressions rational in the roots 
that are solutions of a solvable equation as well as the 
behavior of certain rational expressions under permu- 
tation of the roots — yielded the complete solution in 
the cases both of the cubic and the quartic. It was one 
approach that encompassed the solution of both types 
of equation. But could this technique be extended to 
the case of the quintic and higher-degree polynomials? 
Lagrange was unable to push it through in the case of 
the quintic, but by building on his ideas, Arst his stu- 
dent Paolo RufAni (1765-1822) at the turn of the nine- 
teenth century and then, definitively, the young Norwe- 
gian mathematician Abel in the 1820s showed that, in 
faet, the quintic is not solvable by radicals. (See the 
insolubility of the quintic [V.24].) This negative 
result, however, still left open the questions of which 
algebraic equations were solvable by radicals and why. 

As Lagrange’s analysis seemed to underscore, the 
answer to this question in the cases of the cubic and the 
quartic involved in a critical way the cube and fourth 
roots of unity, respectively. By deAnition, these satisfy 
the particularly simple polynomial equations x 3 -l = 0 
and x 4 - 1 = 0, respectively. It was thus natural to 
examine the general case of the so-called eyelotomie 
equation x n - 1 =0 and ask for what values n the 
nth roots of unity are actually constructible. To put 
this question in equivalent algebraic terms: for which 
n is it possible to And a formula for the nth roots of 
unity that expresses them in terms of integers using the 
usual arithmetical operations and extraction of square 
(but not higher) roots? This was one of the many ques- 
tions explored by Gauss in his wide-ranging, magiste- 
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rial, and groundbreaking 1801 treatise Disquisitiones 
arithmeticæ. One of his most famous results was that 
the regular 17-gon (or, equivalently, a 17th root of 
unity) was constructible. In the course of his analysis, 
he not only employed techniques similar to those devel- 
oped by Lagrange but also developed key concepts such 
as modular arithmetic [III.60] and the properties of 
the modular “worlds” Z p , for p a prime, and, more gen- 
erally, Zn, for n g Z + , as well as the notion of a primi- 
tive element (a generator) of what would later be termed 
a cyclic group. 

Although it is not clear how well he knew Gauss’s 
work, in the years around 1830 galois [VI.41] drew 
from the ideas both of Lagrange on the analysis of 
resolvents and of cauchy [VI.29] on permutations and 
substitutions to obtain a solution to the general prob- 
lem of solvability of polynomial equations by radicals. 
Although his approach borrowed from earlier ideas, 
it was in one important respect fundamentally new. 
Whereas prior efforts had aimed at deriving an explicit 
algorithm for calculating the roots of a polynomial of a 
given degree, Galois formulated a theoretical process 
based on constructs more general than but derived 
from the given equation that allowed him to assess 
whether or not that equation was solvable. 

To be more precise, Galois recast the problem into 
one in terms of two new concepts: helds (which he 
called “domains of rationality”) and groups (or, more 
precisely, groups of substitutions). A polynomial equa- 
tion /(x) = 0 of degree n was reducible over its domain 
of rationality — the ground held from which its coef- 
hcients were taken — if all n of its roots were in that 
ground held; otherwise, it was irreducible over that 
held. It could, however, be reducible over some larger 
held. Consider, for example, the polynomial x 2 + 1 as 
a polynomial over R, the held of real numbers. While 
we know from high-school algebra that this polyno- 
mial does not factor into a product of two real, lin- 
ear factors (that is, there are no real numbers n and 
r-> such that x 2 + 1 = (x - ri)(x - r2)), it does factor 
over €, the held of complex numbers, and, specihcahy, 
x 2 + 1 = (x + i/^TKx - -J- 1 ) . Thus, if we take all 
numbers of the form a + by'-l , where a and b belong 
to R, then we enlarge R to a new held C in which the 
polynomial x 2 + 1 is reducible. If F is a held and x is an 
element of F that does not have an nth root in F, then by 
a similar process we can adjoin an element y to F and 
stipulate that y n = x. We call y a radical. The set of 
all polynomial expressions in y, with coefhcients in F, 
can be shown to form a larger held. Galois showed that 


if it was possible to enlarge F by successively adjoin- 
ing radicals to obtain a held K in which /(x) factored 
into n linear factors, then f(x) = 0 was solvable by 
radicals. He developed a process that hinged both on 
the notion of adjoining an element — in particular, a so- 
called primitive element— to a given ground held and 
on the idea of analyzing the internal structure of this 
new, enlarged held via an analysis of the (hnite) group 
of substitutions (automorphisms of K) that leave invari- 
ant ah rational relations of the n roots of /(x) = 0. The 
group-theoretic aspects of Galois’s analysis were par- 
ticularly potent; he introduced the notions, although 
not the modern terminology, of a normal subgroup of a 
group, a factor group, and a solvable group. Galois thus 
resolved the concrete problem of determining when a 
polynomial equation was solvable by radicals by exam- 
ining it from the abstract perspective of groups and 
their internal structure. 

Galois’s ideas, although sketched in the early 1830s, 
did not begin to enter into the broader mathemati- 
cal consciousness until their publication in 1846 in 
liouville’s [VI.39] Journal des Mathématiques Pures et 
Appliquées, and they were not fully appreciated until 
two decades later when hrst Joseph Serret (1819-85) 
and then jordan [VI.52] heshed them out more fully. 
In particular, Jordan’s Traité des substitutions et des 
équations algébriques (“Treatise on substitutions and 
on algebraic equations”) of 1870 not only highlighted 
Galois's work on the solution of algebraic equations 
but also developed the general structure theory of per- 
mutation groups as it had evolved at the hånds of 
Lagrange, Gauss, Cauchy, Galois, and others. By the end 
of the nineteenth century, this line of development of 
group theory, stemming from efforts to solve algebraic 
equations by radicals, had intertwined with three oth- 
ers: the abstract notion of a group defined in terms 
of a group multiplication table, which was formulated 
by cayley [VI.46], the structural work of mathemati- 
cians like Ludwig Sylow (1832-1918) and Otto Holder 
(1859-1937), and the geometrical work of lie [VI. 53] 
and klein [VI. 5 7]. By 1893, when Heinrich Weber (1842- 
1914) codified much of this earlier work by giving the 
lirst actual abstract definitions of the notions both of 
group and held, thereby recasting them in a form much 
more familiar to the modern mathematician, groups 
and helds had been shown to be of central impor- 
tance in a wide variety of areas, both mathematical and 
physical. 
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7 Exploring the Behavior of 
Polynomials in n Unknowns 

The problem of solving algebraic equations involved 
flnding the roots of polynomials in one unknown. At 
least as early as the late seventeenth century, how- 
ever, mathematicians like leibniz [VI. 15] had been 
interested in techniques for solving simultaneously 
systems of linear equations in more than two vari- 
ables. Although his work remained unknown at the 
time, Leibniz considered three linear equations in three 
unknowns and determined their simultaneous solvabil- 
ity based on the value of a particular expression in 
the coefficients of the system. This expression, equiva- 
lent to what Cauchy would later call the determinant 
[IH. 15] and which would ultimately be associated with 
an nxn square array or matrix [1.3 §4.2] of coeffi- 
cients, was also developed and analyzed independently 
by Gabriel Cramer (1704-52) in the mid eighteenth cen- 
tury in the general context of the simultaneous solution 
of a system of n linear equations in n unknowns. From 
these beginnings, a theory of determinants, indepen- 
dent of the context of solving systems of linear equa- 
tions, quickly became a topic of algebraic study in its 
own right, attracting the attention of Vandermonde, 
laplace [VI.23], and Cauchy, among others. Determi- 
nants were thus an example of a new algebraic con- 
struct, the properties of which were then systematically 
explored. 

Although determinants came to be viewed in terms of 
what Sylvester [VI.42] would dub matrices, a theory of 
matrices proper grew initially from the context not of 
solving simultaneous linear equations but rather of lin- 
early transforming the variables of homogeneous poly- 
nomials in two, three, or more generally n variables. In 
the Disquisitiones arithmeticæ, for example, Gauss con- 
sidered how binary and ternary quadratic forms with 
integer coefficients — expressions of the form aix 2 + 
2 a, 2 xy + a-$y 2 and aix 2 + a. 2 y 2 + aiz 2 + la^xy + 
2 asxz + 2 a§yz, respectively— are affected by a linear 
transformation of their variables. In the ternary case, he 
applied the linear transformation x = ax' + j Sy' + yz r , 
y = a'x’ + p'y' + y'z', and z = a"x' + P"y' + y"z' 
to derive a new ternary form. He denoted the linear 
transformation of the variables by the square array 

a, P, Y 
P\ Y 
a", P", y" 


and, in showing what the composition of two such 
transformations was, gave an explicit example of 
matrix multiplication. By the middle of the nineteenth 
century, Cayley had begun to explore matrices per se 
and had established many of the properties that the 
theory of matrices as a mathematical system in its 
own right enjoys. This line of algebraic thought was 
eventually reinterpreted in terms of the theory of alge- 
bras (see below) and developed into the independent 
area of linear algebra and the theory of vector spaces 
[1.3 §2.3]. 

Another theory that arose out of the analysis of lin- 
ear transformations of homogeneous polynomials was 
the theory of invariants, and this too has its origins in 
some sense in Gauss’s Disquisitiones. As in his study 
of ternary quadratic forms, Gauss began his study 
of binary forms by applying a linear transformation, 
specifically, x = ax' + py', y = yx' + Sy' . The result 
was the new binary form a\ (x ') 2 + 2 a' 2 x'y' + a' 3 (y') 2 , 
where, explicitly, a\ = a\a 2 + 2 a 2 (xy + a^y 2 , a ' 2 = 
aiaP + a 2 (a 5 + Py) + asyd, and a!, = a\P 2 + 2 a, 2 P 5 + 
a. 38 2 . As Gauss noted, if you multiply the second of 
these equations by itself and subtract from this the 
product of the first and the third equations, you obtain 
the relation a 2 - aia!, = ( a 2 - aia3)(ad - Py) 2 . To 
use language that Sylvester would develop in the early 
1850s, Gauss realized that the expression a 2 - 0 , 10,3 in 
the coefficients of the original binary quadratic form 
is an invariant in the sense that it remains unchanged 
up to a power of the determinant of the linear trans- 
formation. By the time Sylvester coined the term, the 
invariant phenomenon had also appeared in the work 
of the English mathematician boole [VI.43], and had 
attracted Cayley’s attention. It was not until after Cay- 
ley and Sylvester met in the late 1840s, however, that 
the two of them began to pursue a theory of invari- 
ants proper, which aimed to determine all invariants for 
homogeneous polynomials of degree m in n unknowns 
as well as simultaneous invariants for systems of such 
polynomials. 

Although Cayley and (especially) Sylvester pursued 
this line of research from a purely algebraic point of 
view, invariant theory also had number-theoretic and 
geometric implications, the former explored by Got- 
thold Eisenstein (1823-52) and hermite [VI.47], the 
latter by Otto Hesse (1811-74), Paul Gordan (1837- 
1912), and Alfred Clebsch (1833-72), among others. 
It was of particular interest to understand how many 
“genuinely distinet” invariants were associated with a 
specific form, or system of forms. In 1868, Gordan 
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achieved a fundamental breakthrough by showing that 
the invariants associated with any binary form in n vari- 
ables can always be expressed in terms of a finite num- 
ber of them. By the late 1880s and early 1890s, how- 
ever, hilbert [VI.63] brought new, abstract concepts 
associated with the theory of algebras (see below) to 
bear on invariant theory and, in so doing, not only re- 
proved Gordan’s result but also showed that the result 
was true for forms of degree m in n unknowns. With 
Hilbert’s work, the emphasis shifted from the concrete 
calculations of his English and German predecessors 
to the kind of structurally oriented existence theorems 
that would soon be associated with abstract, modern 
algebra. 

8 The Quest to Understand 

the Properties of “Numbers” 

As early as the sixth century b.c.e., the Pythagoreans 
had studied the properties of numbers formally. For 
example, they defined the concept of a perfect num- 
ber, which is a positive integer, such as 6 = 1 + 2 + 3 
and 28 = 1 + 2 + 4 + 7 + 14, which is the sum of its 
divisors (excluding the integer itself). In the sixteenth 
century, Cardano and Bombelli had willingly worked 
with new expressions, complex numbers, of the form 
a + 4-b, for real numbers a and b, and had explored 
their computational properties. In the seventeenth cen- 
tury, Fermat famously claimed that he could prove that 
the equation x n + y n = z n , for n an integer greater 
than 2, had no solutions in the integers, except for the 
trivial cases when z = x ox z = y and the remaining 
variable is zero. The latter result, known as fermat’s 
last theorem [V.12], generated many new ideas, espe- 
cially in the eighteenth and nineteenth centuries, as 
mathematicians worked to find an actual proof of Fer- 
mat’s claim. Central to their efforts were the creation 
and algebraic analysis of new types of number systems 
that extended the integers in much the same way that 
Galois had extended fields. This flexibility to create and 
analyze new number systems was to become one of the 
hallmarks of modern algebra as it would develop into 
the twentieth century. 

One of the first to venture down this path was Euler. 
In the proof of Fermat's last theorem for the n = 3 
case that he gave in his Elements of Algebra of 1770, 
Euler introduced the system of numbers of the form 
a + b-./-3, where a and b are integers. He then blithely 
proceeded to factorize them into primes, without fur- 
ther justification, just as he would have factorized 


ordinary integers. By the 1820s and 1830s, Gauss had 
launched a more systematic study of numbers that are 
now called the Gaussian integers. These are all num- 
bers of the form a + b-j-\ , for integers a and b. He 
showed that, like the integers, the Gaussian integers are 
closed under addition, subtraction, and multiplication; 
he defined the notions of unit, prime, and norm in order 
to prove an analogue of the fundamental theorem 
of ARiTHMETic [V.16] for them. He thereby demon- 
strated that there were whole new algebraic worlds to 
create and explore. (See algebraic numbers [IV.3] for 
more on these topics.) 

Whereas Euler had been motivated in his work by 
Fermat’s last theorem, Gauss was trying to generalize 
the law of quadratic reciprocity [V.30] to a law of 
biquadratic reciprocity. In the quadratic case, the prob- 
lem was the following. If a and m are integers with 
m ^ 2, then we say that a is a quadratic residue mod m 
if the equation x 2 = a has a solution mod m; that is, 
if there is an integer x such that x 2 is congruent to 
a mod m. Now suppose that p and q are distinet odd 
primes. If you know whether p is a quadratic residue 
mod q, is there a simple way of telling whether q is a 
quadratic residue mod p? In 1785, Legendre had posed 
and answered this question — the status of q mod p 
will be the same as that of p mod q if at least one 
of p and q is congruent to 1 mod 4, and different if 
they are both congruent to 3 mod 4— but he had given 
a faulty proof. By 1796, Gauss had come up with the 
first rigorous proof of the theorem (he would ultimately 
give eight different proofs of it), and by the 1820s he 
was asking the analogous question for the case of two 
biquadratic equivalences x 4 = p (mod q) and y 4 = q 
(mod p). It was in his attempts to answer this new ques- 
tion that he introduced the Gaussian integers and sig- 
naled at the same time that the theory of residues of 
higher degrees would make it necessary to create and 
analyze still other new sorts of “integers.” Although 
Eisenstein, dirichlet [VI.36], Hermite, kummer [VI.40], 
and KRONECKER [VI.48], among others, pushed these 
ideas forward in this Gaussian spirit, it was dedekind 
[VI. 50] in his tenth supplement to Dirichlet’s Vorlesun- 
gen uber Zahlentheorie ( Lectures on Number Theory) 
of 1871 who fundamentally reconceptualized the prob- 
lem by treating it not number theoretically but rather 
set theoretically and axiomatically. Dedekind intro- 
duced, for example, the general notions— if not what 
would become the precise axiomatic definitions— of 
fields, rings, ideals [III.83 §2], and modules [III.83 §3] 
and analyzed his number-theoretic setting in terms of 
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these new, abstract constructs. His strategy was, from 
a philosophical point of view, not unlike that of Galois: 
translate the “concrete” problem at hånd into new, 
more abstract terms in order to solve it more cleanly 
at a “higher” level. In the early twentieth century, 
noether [VI.76] and her students, among them Bartel 
van der Waerden (1903-96), would develop Dedekind’s 
ideas further to help create the structural approach to 
algebra so characteristic of the twentieth century. 

Parallel to this nineteenth-century, number-theoretic 
evolution of the notion of “number” on the continent of 
Europe, a very different set of developments was taking 
place, initially in the British Isles. From the late eigh- 
teenth century, British mathematicians had debated 
not only the nature of number — questions such as, 
“Do negative and imaginary numbers make sense?” — 
but also the meaning of algebra— questions like, “In an 
expression like ax + by, what values may a, b, x, and 
y legitimately take on and what precisely may “+’ con- 
note?” By the 1830s, the IrishmathematicianHAMiLTON 
[VI. 3 7] had come up with a “unified” interpretation of 
the complex numbers that circumvented, in his view, 
the logical problem of adding a real number and an 
imaginary one, an apple and an orange. Given real num- 
bers a and b, Hamilton conceived of the complex num- 
ber a + feV-1 as the ordered pair (he called it a “cou- 
ple”) (a, b). He then dehned addition, subtraction, mul- 
tiplication, and division of such couples. As he realized, 
this also provided a way of representing numbers in 
the complex plane, and so he naturally asked whether 
he could construct algebraic, ordered triples so as to 
represent points in 3-space. After a decade of con- 
templating this question off and on, Hamilton finally 
answered it not for triples but for quadruples, the so- 
called quaternions [III. 78], “numbers” of the form 
(a,b,c,d) := a+bi+cj+dk,wherea, b, c, and dåre real 
and where i, j, k satisfy the relations ij = -ji = k, jk = 
-kj = i, ki = -ik = j, i 2 = j 2 = k 2 = -1. As in the two- 
dimensional case, addition is defined component-wise, 
but multiplication, while definable in such a way that 
every nonzero element has a multiplicative inverse, is 
not commutative. Thus, this new number system did 
not obey all of the “usual” laws of arithmetic. 

Although some of Hamilton’s British contemporaries 
questioned the extern to which mathematicians were 
free to create such new mathematical worlds, oth- 
ers, like Cayley, immediately took the idea further 
and created a system of ordered 8-tuples, the octo- 
nions, the multiplication of which was neither com- 
mutative nor even, as was later discovered, associa- 


tive. Several questions naturally arise about such sys- 
tems, but one that Hamilton asked was what hap- 
pens if the held of coefficients, the base held, is not 
the reals but rather the complexes? In that case, it 
is easy to see that the product of the two nonzero 
complex quaternions (— y/^T, 0, 1, 0) = -y/tøj -k j and 
(y^T, 0,1,0) - V 1 i j is 1+j 2 = 1+ (-1) = 0. In 
other words, the complex quaternions contain zero 
divisors— nonzero elements the product of which is 
zero— another phenomenon that distinguishes their 
behavior fundamentally from that of the integers. As 
it flourished in the hånds of mathematicians like Ben- 
jamin Peirce (1809-80), frobenius [VI. 5 8], Georg Schef- 
fers (1866-1945), Theodor Molien (1861-1941), car- 
tan [VI.69], and Joseph H. M. Wedderburn (1882- 
1948), among others, this line of thought resulted in 
a freestanding theory of algebras. This naturally inter- 
twined with developments in the theory of matrices 
(the nxn matrices form an algebra of dimension n 2 
over their base held) as it had evolved through the 
work of Gauss, Cayley, and Sylvester. It also merged 
with the not imrelated theory of n-dimensional vector 
spaces (n-dimensional algebras are n-dimensional vec- 
tor spaces with a vector multiplication as well as a vec- 
tor addition and scalar multiplication) that issued from 
ideas like those of Hermann Grassmann (1809-77). 

9 Modern Algebra 

By 1900, many new algebraic structures had been iden- 
tified and their properties explored. Structures that 
were first isolated in one context were then found to 
appear, sometimes unexpectedly, in others: thus, these 
new structures were mathematically more general than 
the problems that had led to their discovery. In the 
opening decades of the twentieth century, algebraists 
(the term is not ahistorical by 1900) increasingly rec- 
ognized these commonalities — these shared structures 
such as groups, helds and rings — and asked questions 
at a more abstract level. For example, what are all of 
the hnite simple groups? Can they be classihed? (See 
THE CLASSIFICATION OF FINITE SIMPLE GROUPS [V.8].) 
Moreover, inspired by the set-theoretic and axiomatic 
work of cantor [VI.54], Hilbert, and others, they came 
to appreciate the common standard of analysis and 
comparison that axiomatization could provide. Corning 
from this axiomatic point of view, Ernst Steinitz (1871— 
1928), for example, laid the groundwork for an abstract 
theory of helds in 1910, while Abraham Fraenkel (1891- 
1965) did the same for an abstract theory of rings four 
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years later. As van der Waerden came to realize in the 
late 1920s, these developments couldbe interpreted as 
dovetailing philosophically with results like Hilbert’s in 
invariant theory and Dedekind’s and Noether’s in the 
algebraic theory of numbers. That interpretation, laid 
out in 1930 in van der Waerden’s classic textbook Mod- 
erne Algebra, codified the structurally oriented “mod- 
ern algebra” that subsumed the algebra of polynomials 
of the high-school classroom and that continues to 
characterize algebraic thought today. 
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II.4 Algorithms 

Jean-Luc Chabert 


1 What Is an Algorithm? 

It is not easy to give a precise definition of the word 
“algorithm.” One can provide approximate synonyms: 


some other words that (sometimes) mean roughly the 
same thing are “rule,” “technique,” “procedure,” and 
“method.” One can also give good examples, such as 
long multiplication, the method one learns in high 
school for multiplying two positive integers together. 
However, although informal explanations and well- 
chosen examples do give a good idea of what an algo- 
rithm is, the concept has undergone a long evolution: it 
was not until the twentieth century that a satisfactory 
formal definition was achieved, and ideas about algo- 
rithms have evolved further even since then. In this arti- 
cle, we shall try to explain some of these developments 
and clarify the contemporary meaning of the term. 

1.1 Abacists and Algorists 

Returning to the example of multiplication, an obvi- 
ous point is that how you try to multiply two numbers 
together is strongly influenced by how you represent 
those numbers. To see this, try multiplying the Roman 
numerals CXLVII and XXIX together without first con- 
verting them into their decimal counterparts, 147 and 
29. It is difficult and time-consuming, and explains why 
arithmetic in the Roman empire was extremely rudi- 
mentary. A numeration system can be additive, as it 
was for the Romans, or positional, like ours today. If it 
is positional, then it can use one or several bases— for 
instance, the Sumerians usedbothbase 10 and base 60. 

For a long time, many processes of calculation used 
abacuses. Originally, these were lines traced on sand, 
onto which one placed stones (the Latin for small stone 
is calculus) to represent numbers. Later there were 
counting tables equipped with rows or columns onto 
which one placed tokens. These could be used to rep- 
resent numbers to a given base. For example, if the 
base was 10, then a token would represent one unit, 
ten units, one hundred units, etc., according to which 
row or column it was in. The four arithmetic operations 
could then be carried out by moving the tokens accord- 
ing to precise rules. The Chinese counting frame can be 
regarded as a version of the abacus. 

In the twelfth century, when the Arabic mathemati- 
cal works were translated into Latin, the denary posi- 
tional numeration system spread through Europe. This 
system was particularly suitable for carrying out the 
arithmetic operations, and led to new methods of cal- 
culation. The term algoritmus was introduced to refer 
to these, and to distinguish them from the traditional 
methods that used tokens on an abacus. 

Although the signs for the numerals had been 
adapted from Indian practice, the numerals became 
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known as Arabic. And the origin of the word “algo- 
rithm” is Arabic: it arose from a distortion of the name 
al-khwårizmi [VI. 5], who was the author of the oldest 
known work on algebra, in the first half of the ninth 
century. His treatise, entitled al-Kitåb al-mukhtasar fi 
hisåb al-jabr wa’l-muqåbala (“The compendious book 
on calculation by completion and balancing”), gave rise 
to the word “algebra.” 

1.2 Finiteness 

As we have just seen, in the Middle Ages the term “algo- 
rithm” referred to the processes of calculation based 
on the decimal notation for the integers. However, in 
the seventeenth century, according to d’alembert’s 
[VI.20] Encyclopédie, the word was used in a more gen- 
eral sense, referring not just to arithmetic but also to 
methods in algebra and to other calculational proce- 
dures such as “the algorithm of the integral calculus” 
or “the algorithm of sines.” 

Gradually, the term came to mean any process of sys- 
tematic calculation that could be carried out by means 
of very precise rules. Finally, with the growing role of 
computers, the important role of finiteness was fully 
understood: it is essential that the process stops and 
provides a result after a finite time. Thus one arrives at 
the following naive definition: 

An algorithm is a set of Gnitely many rules for manip- 
ulating a finite amount of data in order to produce a 
result in a finite number of steps. 

Note the insistence on finiteness: finiteness in the writ- 
ing of the algorithm and finiteness in the implementa- 
tion of the algorithm. 

The formulation above is not of course a mathemat- 
ical definition in the classical sense of the term. As we 
shall see later, it was important to formalize it further. 
But for now, let us be content with this “definition” 
and look at some classical examples of algorithms in 
mathematics. 

2 Three Historical Examples 

A feature of algorithms that we have not yet mentioned 
is iteration, or the repetition of simple procedures. To 
see why iteration is important, consider once again the 
example of long multiplication. This is a method that 
works for positive integers of any size. As the num- 
bers get larger, the procedure takes longer, but — and 
this is of vital importance— the method is “the same”: 
if you understand how to mul tiply two three-digit num- 
bers together, then you do not need to learn any new 


principles in order to multiply two 137-digit numbers 
together (even if you might be rather reluctant to do 
the calculation). The reason for this is that the method 
for long multiplication involves a great deal of carefully 
structured repetition of much smaller tasks, such as 
multiplying two one-digit numbers together. We shall 
see that iteration plays a very important part in the 
algorithms to be discussed in this section. 

2.1 Euclid’s Algorithm: Iteration 

One of the best, and most often used, examples to illus- 
trate the nature of algorithms is euclid's algorithm 
[III.22], which goes back to the third century b.c.e. It 
is a procedure described by euclid [VI.2] to determine 
the greatest common divisor (ged) of two positive inte- 
gers a and b. (Sometimes the greatest common divisor 
is known as the highest common factor (hcf).) 

When one first meets the concept of the greatest com- 
mon divisor of a and b, it is usually defined to be the 
largest positive integer that is a divisor (or factor) of 
both a and b. However, for many purposes it is more 
convenient to think of it as the unique positive inte- 
ger d with the following two properties. First, d is a 
divisor of a and b, and second, if c is any other divi- 
sor of a and b, then d is divisible by c. The method for 
determining d is provided by the first two propositions 
of Book VII of Euclid’s Elements. Here is the first one: 
“Two unequal numbers being set out, and the less being 
continually subtracted in turn from the greater, if the 
number which is left never measures the one before it 
until a unit is left, the original numbers will be prime 
to one another.” In other words, if by carrying out suc- 
cessive alternate subtractions one obtains the number 
1, then the ged of the two numbers is equal to 1. In this 
case one says that the numbers are relatively prime or 
coprime. 

2.1.1 Alternate Subtractions 

Let us describe Euclid’s procedure in general. It is based 
on two simple observations: 

(i) if a = b then the ged of a and b is b (or a); 

(ii) d is a common divisor of a and b if and only if it 
is a common divisor of a - b and b, which implies 
that the ged of a and b is the same as the ged of 
a-b and b. 

Now suppose that we wish to determine the ged of a 
and b and suppose that a ^ b. If a = b then obser- 



II. The Origins of Modem Mathematics 



Figure 1 A flow chart for the 
procedure in Euclfd’s algorithm. 


vation (i) tells us that the ged is b. Otherwise, observa- 
tion (ii) tells us that the answer will be the same as it is 
for the two numbers a - b and b. If we now let a\ be 
the larger of these two numbers and b\ the smaller (of 
course, if they are equal then we just set a\ = bi = b), 
then we are faced with the same task that we started 
with— to determine the ged of two numbers— but the 
larger of these two numbers, ai , is smaller than a, the 
larger of the original two numbers. We can therefore 
repeat the process: if ai = bi then the ged of a i and 
bi, and hence that of a and b, is bi, and otherwise 
we replace a,\ by ai - b\ and reorganize the numbers 
ai - bi and bi so that if one of them is larger then it 
comes first. 

One further observation is needed if we want to show 
that this procedure works. It is the following fundamen- 
tal faet about the positive integers, sometimes known 
as the well-ordering principle. 

(iii) A strietly decreasing sequence of positive integers 
ao > ai > a .2 > ■ ■ ■ must be finite. 

Since the iterative procedure just described produces 
exaetly such a strietly decreasing sequence, the dera- 
tions must eventually stop, which means that at some 
point dk and bk will be equal, and that value is thus the 
ged of a and b (see figure 1). 

2.1.2 Eudidean Divisions 

Euclid's algorithm is usually described in a slightly dif- 
ferent way. One makes use of a more complex pro- 
cedure called Euclidean division— that is, division with 
remainder — which greatly reduces the number of steps 


that the algorithm takes. The basic faet underlying this 
procedure is that if a and b are two positive integers 
then there are (unique) integers q and r such that 
a = bq+r and 0 < r < b. 

The number q is called the quotient and r is the remain- 
der. Remarks (i) and (ii) above are then replaced by the 
following ones: 

(i') if r = 0 then the ged of a and b is equal to b\ 

(ii') the ged of a and b is the same as the ged of b and 

This time, at the first step, one replaces (a, b) by (b, r). 
If r 0, then at the second step one replaces (b, r) by 
(r,n), where n is the remainder in the division of b 
by r, and so on. The sequence of remainders is strietly 
decreasing (b > r >ri >r 2 > 0), so the process stops 
and the ged is the last nonzero remainder. 

It is not hard to see that the two approaches are 
equivalent. Suppose, for example, that a = 103 438 and 
b = 37. If you use the first approach, then you will 
repeatedly subtract 37 from 103 438 until you reach a 
number that is smaller than 3 7. This number will be the 
remainder when 103 438 is divided by 37, which is the 
first number you would calculate if you used the second 
approach. Thus, the reason for the second approach is 
that repeated subtraction can be a very inefficient way 
of calculating remainders. This efficiency gain is very 
important in practice: the second approach gives rise 
to a POLYNOMIAL-TIME ALGORITHM [IV.21 §2], while the 
time taken by the lirst is exponentially long. 

2.1.3 Generalizations 

Euclid’s algorithm can be generalized to many other 
contexts where we have notions of addition, subtrac- 
tion, and multiplication. For example, there is a variant 
of it that applies to the ring [III.83 §1] z[i\ of Gaussian 
integers, that is, numbers of the form a + bi, where a 
and b are ordinary integers. It can also be applied to the 
ring of all polynomials with real coefficients (or coeffi- 
cients in any held, for that matter). The one require- 
ment is that we should be able to find some analogue 
of the notion of division with remainder, after which 
the algorithm is virtually identical to the algorithm for 
positive integers. For example, we have the following 
statement for polynomials: given any two polynomials 
A and B with B not the zero polynomial, there are poly- 
nomials Q and R such that A = BQ+R and either R = 0 
or the degree of R is less than the degree of B. 
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As Euclid noticed ( Elements , Book X, proposition 2), 
one may also carry out the procedure on pairs of num- 
bers a and b that are not necessarily integers. It is easy 
to check that the process will stop if and only if the 
ratio a/b is a rational number. This observation leads 
to the concept of continued fractions [III.22], which 
are discussed in part III. They were not studied explic- 
itly before the seventeenth century, but the roots of the 
idea canbe traced back to archimedes [VI.3]. 

2.2 The Method of Archimedes to Calculate tt: 

Approximation and Finiteness 

The ratio of the circumference of a circle to the diam- 
eter is a constant that has been denoted by tt since 
the eighteenth century (see the article “tt” in part III). 
Let us see how Archimedes, in the third century b.c.e., 
obtained the classical approximation ^ for this ratio. 
If one draws inscribed polygons (whose vertices lie on 
the circle) and circumscribed polygons (whose sides are 
tangent to the circle) and if one computes the length 
of these polygons, then one ohtains lower and upper 
bounds for the value of tt, since the circumference of 
the circle is greater than the length of any inscribed 
polygon and less than the length of any circumscribed 
polygon (figure 2). Archimedes started with regular 
hexagons, and then repeatedly doubled the number of 
sides, obtaining more and more precise bounds. He 
finished with ninety-six-sided polygons, obtaining the 
estimates 

3 + i2<TT<3 + i. 

This process clearly involves iteration, but is it right 
to call it an algorithm? Strictly speaking it is not: how- 
ever many sides you take for your polygon, all you 
will get is an approximation to tt, so the process is 
not finite. However, what we do have is an algorithm 
that will calculate tt to any desired accuracy: for exam- 
ple, if you demand an approximation that is correct 
to ten decimal places, then after a finite number of 
steps the algorithm will give you one. What matters now 
is that the process converges. That is, it is important 
that the values that come out of the iteration get arbi- 
trarily close to tt. The geometric origin of the method 
can be used to prove that this is indeed the case, and 
in 1609 in Germany Ludolph van Ceulen obtained an 
approximation accurate to thirty-five decimal places 
using polygons with 2 62 sides. 

Nevertheless, there is a clear difference between this 
algorithm for approximating tt and Euclid’s algorithm 



for calculating the ged of two positive integers. Algo- 
rithms like Euclid’s are often called discrete algorithms, 
and are contrasted with numerical algorithms, which 
are algorithms that are used to compute numbers that 
are not integers (see Numerical analysis [IV.20]). 

2.3 The Newton-Raphson Method: 

Recurrence Formulas 

In around 1670, newton [VI. 14] devised a method for 
finding roots of equations, which he explained with ref- 
erence to the example x 3 - 2x - 5 = 0. Elis explanation 
starts with the observation that the root x is approxi- 
mately equal to 2. He therefore writes x = 2 + p and 
obtains an equation for p by substituting 2 + p for x in 
the original equation. This new equation works out to 
be p 3 + 6 p 2 + 10 p -1 = 0. Because x is close to 2, p is 
small, so he then estimates p by forgetting the terms 
p 3 and 6 p 2 (since these should be considerably smaller 
than 1 Op - 1). This gives him the equation lOp -1 = 0, 
or p = ^ . Of course, this is not an exact solution, but it 
provides him with a new and better approximation, 2.1, 
for x. He then repeats the process, writing x = 2.1 + g, 
substituting to obtain an equation for g, solving this 
equation approximately, and refining his estimate still 
further. The estimate he obtains for g is -0.0054, so 
the next approximation for x is 2.0946. 

How, though, can we be sure that this process really 
does converge to x? Let us examine the method more 
closely. 
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2.3.1 Tangents and Convergence 

Newton’s method has a geometrical interpretation, 
which Newton himself did not give, in terms of the 
graphof afunction/.Arootxoftheequation/(x) = 0 
corresponds to a point where the curve with equation 
y = fix) intersects the x-axis. If you start with an 
approximate value a for x and set p = x - a, as we 
did above, then when you substitute a + p for x to 
obtain a new function gip), you are effectively moving 
the origin from (0,0) to the point (a, 0). Then when 
you forget all powers of p other than the constant and 
linear terms, you are finding the hest linear approxima- 
tion to the function g — which, geometrically speaking, 
is the tangent line to g at the point ( 0 , c? ( 0 ) ) . Thus, the 
approximate value you obtain for p is the x-coordinate 
of the point where the tangent at (0,^(0)) crosses the 
x-axis. Adding a to this value returns the origin to (0, 0) 
and gives the new approximation to the root of f. This 
is why Newton’s method is often called the tangent 
method (figure 3). And one can now see that the new 
approximation will definitely be better than the old one 
if the tangent to / at (a, fia) ) intersects the x-axis at a 
point that lies between a and the point where the curve 
y = fix) intersects the x-axis. 

As it happens, this is not the case for Newton’s choice 
of the value a = 2 above, but it is true for the approx- 
imate value 2.1 and for all subsequent ones. Geo- 
metrically, the favorable situation occurs if the point 
(a,f(a)) lies above the x-axis in a convex part of the 
curve that crosses the x-axis or below the x-axis in a 
concave part of the curve that crosses the x-axis. Under 
these circumstances, and provided the root is not a 
multiple one, the convergence is quadratic, meaning 
that the error at each stage is roughly the square of 
the error at the previous stage — or, equivalently, the 


approximation is valid to a number of decimal places 
that roughly doubles at each stage. This is enormously 
fast. 

The choice of the initial approximation value is obvi- 
ously important, and raises unexpectedly subtle ques- 
tions. These are clearer if we look at complex polyno- 
mials and their complex roots. Newton’s method can be 
easily adapted to this more general context. Suppose 
that z is a root of some complex polynomial and that 
Zo is an initial approximation for z. Newton’s method 
then gives us a sequence zo,zi,Z2, . . . , which may or 
may not converge to z. We define the domain ofattrac- 
tion, denoted A(z), to be the set of all complex num- 
bers zo such that the resulting sequence does indeed 
converge to z. How do we determine A(z)? 

The first person to ask this problem was cayley 
[VI.46], in 1879. He noticed that the solution is easy 
for quadratic polynomials but difficult as soon as the 
degree is 3 or more. For example, the domains of 
attraction of the roots ±1 of the polynomial z 2 - 1 
are the open half -planes bounded by the vertical axis, 
but the domains corresponding to the roots 1, co, and 
co 2 of z 3 - 1 are extremely complicated sets. They 
were described by Julia in 1918— such subsets are now 
called fractal sets. Newton’s method and fractal sets are 
discussed further in Dynamics [IV.15]. 

2.3.2 Recurrence Formulas 

At each stage of his method, Newton had to produce 
a new equation, but in 1690 Raphson noticed that this 
was not really necessary. For particular examples, he 
gave single formulas that could be used at each step, 
but his basic observation applies in general and leads 
to a general formula for every case, which one can 
easily obtain using the interpretation in terms of tan- 
gents. Indeed, the tangent to the curve y = fix) at the 
point of x-coordinate a has the equation y - fia) = 
f'ia)(x - a), and it cuts the x-axis at the point with 
x-coordinate a - f(a)lf'(a). What we now call the 
Newton-Raphson method springs from this simple for- 
mula. One starts with an initial approximation ao = a 
and then defines successive approximations using the 
recurrence formula 


As an example, let us consider the function fix) = 
x 2 - c. Here, Newton’s method provides a sequence of 
approximations of the square root fc of c, given by 
the recurrence formula a n +i = \(a n + c/a n ) (which 


II. 4. Algorithms 


111 


we obtain by substituting x 2 + c for / in the general 
formula above). This method for approximating square 
roots was known by Heron of Alexandria in the first 
century. Note that if ao is close to v ''c, then c /ao is also 
close, y'c lies between them, and a\ = \ (ao + c /ao) is 
their arithmetic mean. 

3 Does an Algorithm Always Exist? 

3.1 Hilbert’s Tenth Problem: 

The Need for Formalization 

In 1900, at the Second International Congress of Math- 
ematicians, hilbert [VI.63] proposed a list of twenty- 
three problems. These problems, and Hilbert’s works in 
general, had a huge influence on mathematics during 
the twentieth century (Gray 2000). We are interested 
here in Hilbert’s tenth problem: given a Diophantine 
equation, that is, a polynomial equation with any num- 
ber of indeterminates and with integer coefficients, “a 
process is sought by which it can be determined, in a 
finite number of operations, whether the equation is 
solvable in integral numbers.” In other words, we have 
to find an algorithm which tells us, for any Diophan- 
tine equation, whether or not it has at least one integer 
solution. Of course, for many Diophantine equations it 
is easy to find solutions, or to prove that no solutions 
exist. However, this is by no means always the case: con- 
sider, for instance, the Fermat equation x n + y n = z n 
(n > 3). (Even before the solution of fermat’s last 
theorem [V.12] an algorithm was known for deter- 
mining for any specific n whether this equation had 
a solution. However, one could not call it easy.) 

If Hilbert’s tenth problem has a positive answer, then 
one can demonstrate it by exhibiting a “process” of the 
sort that Hilbert asked for. To do this, it is not necessary 
to have a precise understanding of what a “process” is. 
However, if you want to give a negative answer, then 
you have to show that no algorithm exists, and for that 
you need to say precisely what counts as an algorithm. 
In section 1.2 we gave a definition that seems to be rea- 
sonably precise, but it is not precise enough to enable 
us to think about Hilbert’s tenth problem. What kind of 
rules are we allowed to use in an algorithm? How can 
we be sure that no algorithm achieves a certain task, 
rather than just that we are unable to find one? 

3.2 Recursive Functions: Church’s Thesis 

What we need is a formal definition of the notion of an 
algorithm. In the seventeenth century, leibniz [VI. 15] 


envisaged a universal language that would allow one to 
reduce mathematical proofs to simple computations. 
Then, during the nineteenth century, logicians such 
as Charles Babbage, boole [VI.43], frege [VI. 56], and 
peano [VI.62] tried to formalize mathematical reason- 
ing by an “algebraization” of logic. Finally, between 
1931 and 1936, godel [VI.92], church [VI.89], and 
Stephen Kleene introduced the notion of recursive func- 
tions (see Davis (1965), which contains the original 
texts). Roughly speaking, a recursive function is one 
that can be calculated by means of an algorithm, but 
the definition of recursive functions is different, and is 
completely precise. 

3.2.1 Primitive Recursive Functions 

Another rough definition of a recursive function is as 
follows: a recursive function is one that has an induc- 
tive definition. To give an idea of what this means, let us 
consider addition and multiplication as functions from 
N x N to N. To emphasize this, we shall write sum(x, y) 
and prod(x, y) for x + y and xy, respectively. 

A familiar faet about multiplication is that it is 
“repeated addition.” Let us examine this more pre- 
cisely. We can define the function “prod” in terms 
of the function “sum” by means of the following 
two rules: prodU,^) = y and prod(x + 1 ,y) = 
sum(prod(x, >’),>’)■ Thus, if you know prod (x,y) and 
you know how to calculate sums, then you know 
prod(x + 1 ,y). Since you also know the “base case” 
prod (l,y), a simple inductive argument shows that 
these simple rules completely determine the function 
“prod.” 

We have just seen how one function can be “recur- 
sively defined” in terms of another. We now want to 
understand the class of all functions from N™ to N that 
can be built up in a few basic ways, of which recursion 
is the most important. We shall refer to functions from 
N n to N as n-ary functions. 

To begin with, we need an initial stock of functions 
out of which the rest will be built. It turns out that a 
very simple set of functions is enough. Most basic are 
the constant functions: that is, functions that take every 
n-tuple in M n to some fixed positive integer c. Another 
very simple function, but the function that allows us 
to create mueh more interesting ones, is the successor 
function, which takes a positive integer n to the next 
one, n + 1. Finally, we have projection functions: the 
function 17" takes a sequence (xi,...,x n ) in N™ and 
maps it to the kth coordinate x^. 
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We then have two ways of constructing functions 
from other functions. The first is substitution. Given an 
m-ary function <f and m n- ary functions ipi,..., ip m , 
one defines an n-ary function by (xi,...,x n ) 
4>(ipi(xi,...,x n ),...,ip m (xi,...,x n )). For example, 
(x + y) 2 = prod(sum(x,;y),sum(x,;y)), so we can 
obtain the function (x,y) — (x + y) 2 from the func- 
tions “sum” and “prod” by means of substitution. 

The second method of construction is called prim- 
itive recursion. This is a more general form of the 
inductive method we used above in order to con- 
struct the function “prod” from the function “sum.” 
Given an ( n - l)-ary function (// and an (n+ l)-ary 
function p, one defines an n-ary function 4> by say- 
ing that </>(l,X2,...,x n ) = t//(x2, . . . ,x n ) and <Mfc + 
l,X2,...,X n ) = pik, <£(fc,X2,...,X ra ),X2,...,X n ). In 
other words, ip tells you the “initial values” of <p 
(the values when the first coordinate is 1) and p 
tells you how to work out <p(k + 1,X2, . . . ,x n ) in 
terms of <p(k,x 2,...,x n ),X2,...,x n and k. (The sum- 
product example was simpler because we did not have 
a dependence on k.) 

A primitive recursive function is any function that can 
be built from the initial stock of functions using the two 
operations, substitution and primitive recursion, that 
we have just described. 

3.2.2 Recursive Functions 

If you think for a while about primitive recursion and 
know a small amount about programming computers, 
you should be able to convince yourself that they are 
effectively computable: that is, that for any primitive 
recursive function there is an algorithm for computing 
it. (For example, the operation of primitive recursion 
can usually be realized in a rather direct way as a FOR 
loop.) 

How about the converse? Are all computable func- 
tions primitive recursive? Consider, for example, the 
function that takes the positive integer n to p n , the 
nth prime number. It is not hard to devise a simple 
algorithm for computing p n , and it is then a good exer- 
cise (if you want to understand primitive recursion) to 
convert this algorithm into a proof that the function is 
primitive recursive. 

However, it turns out that this function is not typical: 
there are computable functions that are not primitive 
recursive. In 1928, Wilhelm Ackermann defined a func- 
tion, now known as the Ackermann function, that has a 
“doubly inductive” definition. The following function is 


not quite the same as Ackermann’s, but it is very simi- 
lar. It is the function A(x,y) that is determined by the 
following recurrence rules: 

(i) A(l,y) = y + 2 for every y; 

(ii) A(x, 1) = 2 for every x; 

(iii) A(x+1,3' + 1) = A(x,A(x+l,y)) wheneverx > 1 
and y > 1. 

For example, A(2,y + 1) = A(l, A(2,y)) = A(2,y) + 2. 
From this and the faet that A(2, 1) = 2 it follows that 
A(2,y) = 2 y for every y. In a similar way one can 
show that A(3,y ) = 2 y , and in general that for each x 
the function that takes y to A(x + 1 ,y) “iterates” the 
function that takes y to A(x,y). This means that the 
values of A(x, y ) are extremely large even when x and 
y are fairly small. For example, A(4,y + 1) = 2 M4 ' y) , 
so in general A (4, y ) is given by an “exponential tower” 
of height y. We have A(4, 1) = 2, A(4, 2) = 2 2 = 4, 
A(4, 3) = 2 4 = 16, A(4, 4) = 2 16 = 65 536, and 
A(4, 5) = 2 65536 , which is too large a number for its 
decimal notation to be reproduced here. 

It can be shown that for every primitive recursive 
function 4> there is some x such that the function 
A(x,y) grows faster than <f>(y). This is proved by an 
inductive argument. To oversimplify slightly, if ipiy) 
and p (y ) have already been shown to grow more slowly 
than A(x,y ), then one can show that the function 
(f produced from them by primitive recursion also 
grows more slowly. This allows us to define a “diag- 
onal” function A(y) = A(y,y) that is not primi- 
tive recursive because it grows faster than any of the 
functions A(x,y). 

If we are trying to understand in a precise way which 
functions can be calculated algorithmically, then our 
definition will surely have to encompass functions like 
the Ackermann function, since they can in principle be 
computed. Therefore, we must consider a larger class 
of functions than just the primitive recursive ones. 
This is what Godel, Church, and Kleene did, and they 
obtained in different ways the same class of recursive 
functions. For instance, Kleene added a third method of 
construction, which he called minimization. If / is an 
(n + l)-ary function, one defines an n-ary function g 
by taking g{x i, . . . ,x n ) to be the smallest y such that 
fix i , . . . , x n ,y) = 0. (If there is no such y, one regards 
g as undefined for (xi,...,x n ). We shall ignore this 
complication in what follows.) 

It turns out that, not only is the Ackermann function 
recursive, but so are all functions that one can write 
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a computer program to calculate. So this gives us the 
formal definition of computability that we did not have 
before. 

3.2.3 Effective Calculability 

With such a class of recursive functions, Church 
claimed that the class of “effectively calculable” func- 
tions is exactly the class of recursive functions. 
Church’s thesis is widely believed, but this is a convic- 
tion that cannot be proved since the notion of recur- 
sive function is a mathematically precise concept while 
that of an effectively calculable function is an intuitive 
notion, actually quite like that of “algorithm.” Church’s 
statement lies in the realm of metamathematics and is 
now called Church’s thesis. 

3.3 Turing Machines 

One of the strongest pieces of evidence for Church’s 
thesis is that in 1936 turing [VI.94] found a very 
different-looking way of formalizing the notion of an 
algorithm, which he showed was equivalent. That is, 
every function that was computable in his new sense 
was recursive and vice versa. His approach was to 
define a notion that is now called a Turing machine, 
which can be thought of as an extremely primitive com- 
puter, and which played an important part in the devel- 
opment of actual computers. Indeed, functions that 
are computable by Turing machmes are precisely those 
that can be programmed on a computer. The primi- 
tive architecture of Turing machines does not make 
them any less powerful: it merely means that in prac- 
tice they would be too cumbersome to program or to 
implement in hardware. Since recursive functions are 
the same as Turing-computable functions, it follows 
that recursive functions too are those functions that 
can be programmed on a computer, so to disbelieve 
Church’s thesis would be to maintain that there are 
some “effective procedures” that cannot be converted 
into computer programs— which seems rather implau- 
sible. A description of Turing machines can be found 
in COMPUTATIONAL COMPLEXITY [IV. 21 §1]. 

Turing introduced his machines in response to a 
question that generalized Hilbert’s tenth problem. The 
Entscheidungsproblem, or decision problem, was also 
asked by Hilbert, in 1922. He wanted to know whether 
there was a “mechanical process” by which one could 
determine whether any given mathematical statement 
could be proved. In order to think about this, Turing 


needed a precise notion of what constituted a “mechan- 
ical process.” Once he had defined Turing machines, 
he was able to show by means of a fairly straightfor- 
ward diagonal argument that the answer to Hilbert’s 
question was no. His argument is outlined in the 
INSOLUBILITY OF THE HALTING PROBLEM [V.23]. 

4 Properties of Algorithms 

4.1 Iteration versus Recursion 

As previously mentioned, we often encounter compu- 
tation rules which define each element of a sequence 
in terms of the preceding elements. This gives rise to 
two different ways of carrying out the computation. 
The first is iteration: one computes the first terms, then 
one obtains succeeding terms by means of a recurrence 
formula. The second is recursion, a procedure which 
seems circular at first because one defines a procedure 
in terms of itself. However, this is allowed because the 
procedure calls on itself with smaller values of the vari- 
ables. The concept of recursion is subtle and powerful. 
Let us try to clarify the difference between recursion 
and iteration with some examples. 

Suppose that we wish to compute n! = 1 ■ 2 ■ 
3 ■ ■ ■ (n — 1) ■ n. An obvious way of doing it is to note 
the recurrence relation n! = n ■ (n - 1)! and the ini- 
tial value 1! = 1. Having done so, one could then 
compute successively the numbers 2!, 3!, 4!, and so 
on until one reached n\, which would be the iterative 
approach. Alternatively, one could say that if fact(n) is 
the result of a procedure that leads to n!, then fact(n) = 
n x fact(n - 1), which would be a recursive procedure. 
The second approach says that to obtain n! it suffices to 
knowhow to obtain (n- 1)!, and to obtain (n- 1)! it suf- 
fices to knowhow to obtain (n-2)!, and so on. Since one 
knows that 1! = 1, one can obtain n!. Thus, recursion 
is a bit like iteration but thought of “backwards.” 

In some ways this example is too simple to show 
clearly the difference between the two procedures. 
Moreover, if one wishes to compute n\, then iteration 
seems simpler and more natural than recursion. We 
now look at an example where recursion is far simpler 
than iteration. 

4.1.1 The Tower of Hanoi 

The Tower of Hanoi is a problem that goes back to 
Édouard Lucas in 1884. One is given n disks, all of dif- 
ferent sizes and each with a hole in the middle, stacked 
on a peg A in order of size, with the largest one at the 
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bottom. We also have two empty pegs B and C. The 
problem is to move the stack from peg A to peg B while 
obeying the following rules. One is allowed to move just 
one disk at a time, and each move consists in taking 
the top disk from one of the pegs and putting it onto 
another peg. In addition, no disk may ever be placed 
above a smaller disk. 

The problem is easy if you have just three disks, 
but becomes rapidly harder as the number of disks 
increases. However, with the help of recursion one can 
see very quickly that an algorithm exists for moving 
the disks in the required way. Indeed, suppose that we 
know a procedure H(n - 1 ) that solves the problem for 
n - 1 disks. Here is a procedure Hin) for n disks: move 
the first n — 1 disks on top of A to C with the procedure 
H(n- 1 ) , then move the last disk on A to B, and finally 
apply once more the procedure H (n - 1 ) to move all the 
disks from C to B. If we write Hab (n) for the procedure 
that moves n disks from peg A to peg B according to the 
rules, then we can represent this recursion symbolically 

H AB (n)=HAc(n-l)H AB (im c (n-l). 

Thus, Hab (ti) is deduced from Hac(h — 1) and Hbc(h- 
1), which are clearly equivalent to Hab(h - 1)- Since 
Hab(1) is certainly easy, we have the full recursion. 

One can easily check by induction that this proce- 
dure takes 2 n - 1 moves— moreover, it turns out that 
the task cannot be accomplished in fewer moves. Thus, 
the number of moves is an exponential function of n, 
so for large n the procedure will be very long. 

Furthermore, the larger n is, the more memory one 
must use to keep track of where one is in the procedure. 
By contrast, if we wish to carry out an iteration during 
an iterative procedure, it is usually enough to know just 
the result of the previous iteration. Thus, the most we 
need to remember is the result of one iteration. There 
is in faet an iterative procedure for the Tower of Hanoi 
as well. It is easy to describe, but it is mueh less obvious 
that it actually solves the problem. It encodes the posi- 
tions of the n disks as an n-bit sequence and at each 
step applies a very simple rule to obtain the next n-bit 
sequence. This rule makes no reference to how many 
steps have so far taken place, and therefore the amount 
of memory needed, beyond that required to store the 
positions of the disks, is very small. 

4.1.2 The Extended Euclid Algorithm 

Euclid’s algorithm is another example that lends itself 
in a very natural way to a recursive procedure. Recall 


that if a and b are two positive integers, then we 
can write a = qb + r with 0 < r < b. The algo- 
rithm depended on the observation that ged (a,b) = 
ged (b,r). Since the remainder r canbe calculated eas- 
ily from a and b, and since the pair (b,r) is smaller 
than the pair ( a , b), this gives us a recursive procedure, 
which stops when we reach a pair of the form ( a , 0). 

An important extension of Euclid’s algorithm is 
Bézout’s lemma, which States that for any pair of posi- 
tive integers ( a , b) there exist (not necessarily positive) 
integers u and v such that 

ua + vb = d = gcd(a, b). 

How can we obtain such integers u and vi The answer 
is given by the extended Euclid algorithm, which again 
can be defined using recursion. Suppose we can find a 
pair (u',v') that works for b andr: that is, u'b + v'r = 
d. Since a = qb + r, we can substitute r = a - qb into 
this equation and deduce that d = u'b + v'{a - qb) = 
v' a+(u’ -v' q)b. Thus, settingu = v' andv = u' -v'q, 
we have ua + vb = d. Since a pair (u, v) that works 
for a and b can be easily calculated from a pair (u r , v') 
that works for the smaller b and r, this gives us a recur- 
sive procedure. The “bottom” of the recursion is when 
r = 0, in which case we know that lb + Or = d. Once we 
reach this, we can “run back up” through Euclid’s algo- 
rithm, successively modifying our pair (u,v) according 
to the rule just given. Notice, incidentally, that the faet 
that this procedure exists is a proof of Bézout's lemma. 

4.2 Complexity 

So far we have considered algorithms in a theoretical 
way and ignored their obvious practical importance. 
However, the mere existence of an algorithm for car- 
rying out a certain task does not guarantee that your 
computer can do it, because some algorithms take so 
many steps that no computer can implement them 
(unless you are prepared to wait billions of years for 
the answer). The complexity of an algorithm is, loosely 
speaking, the number of steps it takes to complete 
its task (as a function of the size of the input). More 
precisely, this is the time complexity of the algorithm. 
There is also its space complexity, which measures the 
maximum amount of memory a computer needs in 
order to implement it. Complexity theory is the study 
of the computational resources that are needed to carry 
out various tasks. It is discussed in detail in computa- 
tional complexity [IV.21] — here we shall give a hint 
of it by examining the complexity of one algorithm. 
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4.2.1 The Complexity o f Euclid’s Algorithm 

The length of time that a computer will take to imple- 
ment Euclid’s algorithm is closely related to the number 
of times one needs to compute quotients and remain- 
ders: that is, to the number of times that the recur- 
sive procedure calls on itself. Of course, this number 
depends in turn on the size of the numbers a and b 
whose ged is to be determined. An initial observation 
is that if 0 < b < a, then the remainder in the divi- 
sion of a by b is less than a/2. To see this, notice that 
if b > a/2 then the remainder is a - b, which is at 
most a/2, whereas if b ^ a/2 then we know that the 
remainder is at most b and so is again at most a/2. It 
follows that after two steps of calculating the remain- 
der, one arrives at a pair where the larger number is 
at most half what it was before. From this it is easy to 
show that the number of such calculations needed is at 
most 2 log 2 a + 1 , which is roughly proportional to the 
number of digits of a. Since this number is far smaller 
than a itself, the algorithm can be used easily for very 
large numbers, which gives it great practical Utility to 
go with its theoretical significance. 

The number of divisions needed in the worst case 
does not appear to have been studied until the first half 
of the nineteenth century: the above bound of 2 log 2 a+ 
1 was given by Pierre-Joseph-Étienne Finck in 1841. It 
is in faet not hard to improve this result slightly and 
prove that the algorithm takes longe st when a and b are 
consecutive Fibonacci numbers. This implies that the 
number of divisions needed is never more than logø a+ 
1, where <p is the golden ratio. 

Euclid’s algorithm also has low space complexity: 
once one has replaced a pair (a, b) by a new pair ( b,r ), 
one can forget the original pair, so at any stage one 
does not have to hold very mueh in one’s memory (or 
store it in the memory of one’s computer). By contrast, 
the extended Euclid algorithm appears to require one 
to remember the entire sequence of calculations that 
leads to the ged d of a and b, so that one can make 
a series of substitutions and eventually find u and v 
such that ua + vb = d. However, a doser look at it 
reveals that one can perform it while keeping track of 
only a few numbers at any one time. 

Fet us see how this works with an example. We shall 
set a = 38, b = 21, and find integers u and v such that 
3 8 tt + 21v = 1. We begin by writing down the first step 
of Euclid’s algorithm: 

38= 1x21 + 17. 


This tells us that 17 = 38 - 21. Now we write down the 
second step: 

21 = 1 X 17 + 4. 

We know how to write 17 in terms of 38 and 21, so let 
us do a substitution: 

21 - 1 x (38 21) i 4. 

Rearranging this, we discover that 4 = 2 x 21 - 38. Now 
we write down the third step of Euclid’s algorithm: 

17 = 4x4 + 1. 

We know how to write 17 and 4 in terms of 38 and 21, 
so let us substitute again: 

38 - 21 = 4 x (2 x 21 - 38) + 1. 
Rearranging this, we find that 1 = 5 x 38 — 9 x 21, and 
we have finished. 

If you think about this procedure, you will see that 
at each stage one just has to keep track of how two 
numbers are expressed in terms of a and b. Thus, the 
space complexity of the extended Euclid algorithm is 
small if you implement it properly. 

5 Modern Aspects of Algorithms 
5.1 Algorithms and Chance 

Earlier it was remarked that the notion of algorithm 
has continued to develop even since its formalization 
in the 1920s and 1930s. One of the main reasons for 
this has been the realization that randomness can be 
a very useful tool in algorithms. This may seem puz- 
zling at first, since algorithms as we have described 
them are deterministic procedures; in a moment we 
shall give an example that illustrates how randomness 
canbe used. A second reason is the recent development 
of the notion of a quantum algorithm: for more about 
this, see quantum computation [III. 76]. 

The following simple example illustrates how chance 
can be useful. Given an integer n, we shall define a func- 
tion f(n) that is not too hard to calculate but is diffl- 
cult to analyze. If n has d digits, then you approximate 
y/n to the point where the first d digits after the dec- 
imal point are correct (using Newton’s method, say), 
and let f(n) equal the dth digit. Now suppose that you 
wish to know roughly what proportion of numbers n 
between 10 30 and 10 31 have /(n) = 0. There does not 
seem to be a good way of determining this theoretically, 
but calculating it on a computer looks very hard, too, 
as there are so many numbers between 10 30 and 10 31 . 
However, if one chooses a random sample of 10 000 
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numbers between 10 30 and 10 31 and does the calcula- 
tion for just those numbers, then with high probability 
the proportion of those numbers with f(n) = 0 will be 
roughly the same as the proportion of all numbers in 
the range with f(n) = 0. Thus, if you do not demand 
absolute certainty but instead are satisfied with a very 
small error probability, then you can achieve your goal 
with much more modest computational resources. 

5.1.1 Pseudorandom Numbers 

How, though, does one use a deterministic computer to 
select ten thousand random numbers between 10 30 and 
10 31 ? The answer is that one does not in faet need to: it 
is almost always good enough to make a pseudorandom 
selection instead. The basic idea is well-illustrated by 
a method proposed by von neumann [VI.91] in the 
mid 1940s. One begins with a 2n-digit integer a, called 
the “seed,” calculates o 2 , and extracts from a 2 a new 
2n-digit number b by taking all the digits of a 2 from the 
(n + l)st to the 3nth. One then repeats the procedure 
for b, and so on. Because of the way multiplication jum- 
bles up digits, the resulting sequence of 2n-digit num- 
bers is very hard to distinguish from a truly random 
sequence, and can be used in randomized algorithms. 

There are many other ways of producing pseudo- 
random sequences, and this raises an obvious ques- 
tion: what properties should a sequence have for us 
to regard it as pseudorandom? This turns out to be a 
complex question, and several different answers have 
been proposed. Randomized algorithms and pseudo- 
randomness are discussed in depth in computational 
complexity [IV.21 §§6, 7], and a formal definition of 
“pseudorandom generators” can be found there. (See 
also COMPUTATIONAL NUMBER THEORY [IV. 5 §2] for an 
account of a famous randomized algorithm for testing 
whether a number is prime.) Here, let us discuss a simi- 
lar question about infmite sequences of zeros and ones. 
When should we regard such a sequence as “random”? 

Again, many different answers have been suggested. 
One idea is to consider simple statistical tests: we 
would expect that in the long run the frequency of zeros 
should be roughly the same as that of ones, and more 
generally that any small subsequence such as 00110 
should appear with the “right” frequency (which for 
this sequence would be ^ since it has length 5). 

It is perfeetly possible, however, for a sequence to 
pass these simple tests but to be generated by a deter- 
ministic procedure. If one is trying to decide whether 
a sequence of zeros and ones is actually random— 
that is, produced by some means such as tossing a 


coin— then we will be very suspicious of a sequence if 
we can identify an algorithm that produces the same 
sequence. For example, we would reject a sequence that 
was derived in a simple way from the digits of tt, even 
if it passed the statistical tests. However, merely to ask 
that a sequence cannot be produced by a recursive pro- 
cedure does not give a good test for randomness: for 
example, if one takes any such sequence and alternates 
the terms of that sequence with zeros, one then obtains 
a new sequence that is far from random, but which still 
cannot be produced recursively. 

For this reason, von Mises suggested in 1919 that a 
sequence of zeros and ones should be called random if 
it is not only the case that the limit of the frequency of 
ones is \ , but also that the same is true for any subse- 
quence that can be extracted “by means of a reasonable 
procedure.” In 1940 Church made this more precise by 
translating “by means of a reasonable procedure” into 
“by means of a recursive funetion.” However, even this 
condition is too weak: there are such sequences that 
do not satisfy the “law of the iterated logarithm” (some- 
thing that a random sequence would satisfy). Currently, 
the so-called Martin-Lof thesis, formulated in 1966, is 
one of the most commonly used definitions of random- 
ness: a random sequence is a sequence that satisfies all 
the “effeetive statistical sequential tests,” a notion that 
we cannot formulate precisely here, but which uses in 
an essential manner the notion of recursive funetion. By 
contrast with Church’s thesis, with which almost every 
mathematician agrees, the Martin-Lof thesis is still very 
much under discussion. 

5.2 The Influence of Algorithms on 
Contemporary Mathematics 

Throughout its history, mathematics has concerned 
itself with problems of existence. For example, are there 
transcendental numbers [III.43], that is, numbers 
that are not the root of any polynomial with integer 
coefficients? There are two kinds of answers to such 
questions: either one actually exhibits a number such 
as tt and proves that it is transcendental (this was done 
by Carl Iindemann in 1873), or one gives an “indirect 
existence proof,” such as cantor’s [VI.54] demonstra- 
tion that there are “far more” real numbers than there 
are roots of polynomials with integer coefficients (see 
COUNTABLE AND UNCOUNTABLE SETS [III.ll]), which 
shows in particular that some real numbers must be 
transcendental. 
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5.2.1 Constructivist Schools 

In around 1910, under the influence of brouwer 
[VI.75], the intuitionist school HL 7 §3.1] of math- 
ematics arose, which rejected the principle of the 
excluded middle, which is the principle that every 
mathematical assertion is either true or false. In par- 
ticular, Brouwer did not accept that the existence of a 
mathematical object such as a transcendental number 
is proved by the faet that its nonexistence would lead 
to a contradiction. This was the first of several “con- 
structivist” schools, for which an object exists if and 
only if it can be constructed explicitly. 

Not many working mathematicians subscribe to 
these principles, but almost all would agree that there 
is an important difference between constructive proofs 
and indirect proofs of existence, a difference that has 
come to seem more important with the rise of com- 
puter science. This has added a further level of refine- 
ment: sometimes, even if you know that a mathemat- 
ical object can be produced algorithmically, you still 
care whether the algorithm can be made to work in a 
reasonably short time. 

5.2.2 Effective Results 

In number theory there is an important distinetion 
between “effective” and “ineffeetive” results. For exam- 
ple, mordell's conjecture [V.31], proposed in 1922 
and finally proved by Faltings in 1983, States that a 
smooth rational plane curve of degree n > 3 has at 
most finitely many points with rational coefficients. 
Among its many consequences is that the Fermat equa- 
tion x n + y n = z n has only finitely many integral solu- 
tions for each n ^ 4. (Of course, we now know that it 
has no nontrivial solutions, but the Mordell conjecture 
was proved before Fermat’s last theorem, and it has 
many other consequences.) However, Faltings’s proof is 
ineffeetive, which means that it does not give any infor- 
mation about how many solutions there are (except that 
there are not infinitely many), or how large they can be, 
so one cannot use a computer to find them all and know 
that one has finished the job. There are many other 
very important proofs in number theory that are inef- 
feetive, and replacing any one of them with an effective 
argument would be a major breakthrough. 

A completely different set of issues was raised by 
another solution to a famous open problem, the four- 
color theorem [V.14], which was conjectured by Fran- 
cis Guthrie, a student of de morgan [VI.38], in 1852 
and proved in 1976 by Appel and Haken, with a proof 


that made essential use of computers. They began with 
a theoretical argument that reduced the problem to 
checking finitely many cases, but the number of cases 
was so large that it could not be done by hånd and was 
instead done by computers. But how should we judge 
such a proof? Can we be sure that the computer has 
been programmed correctly? And even if it has, how do 
we know with a computation of that size that the com- 
puter has operated correctly? And does a proof that 
relies on a computer really tell us why the theorem is 
true? These questions continue to be debated today. 
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II. 5 The Development of Rigor in 
Mathematical Analysis 

Tom Archibald 


1 Background 

This article is about how rigor was introduced into 
mathematical analysis. The question is a complicated 
one, since mathematical practice has changed consid- 
erably, especially in the period between the founding of 
the calculus (shortly before 1700) and the early twen- 
tieth century. In a sense, the basic criteria for what 
constitutes a correct and logical argument have not 
altered, but the circumstances under which one would 
require such an argument, and even to some degree the 
purpose of the argument, have altered with time. The 
voluminous and successful mathematical analysis of 
the 1700s, associated with names such as Johann and 
Daniel bernoulli [VI. 18], euler [VI. 19], and lagrange 
[VI.22], lacked foundational clarity in ways that were 
criticized and remedied in subsequent periods. By 



118 


II. The Origins of Modem Mathematics 


around 1910 a general consensus had emerged about 
how to make arguments in analysis rigorous. 

Mathematics consists of more than techniques for 
calculation, methods for describing important features 
of geometric objects, and models of worldly phenom- 
ena. Almost all working mathematicians today are 
trained in, and concerned with, the production of rig- 
orous arguments that justify their conclusions. These 
conclusions are usually framed as theorems, which are 
statements of faet, accompanied by an argument, or 
proof, that the theorem is indeed true. Here is a simple 
example: every positive whole number that is divisible 
by 6 is also divisible by 2 . Running through the six times 
table (6, 12, 18, 24, ...) we see that each number is even, 
which makes the statement easy enough to believe. A 
possible justification of it would be to say that since 6 
is divisible by 2, then every number divisible by 6 must 
also be divisible by 2. 

Such a justification might or might not be thought 
of as a thorough proof, depending on the reader. For 
on hearing the justification we can raise questions: is it 
always true that if a, b, and c are three positive whole 
numbers such that c is divisible by b and b is divisi- 
ble by a, then c is divisible by al What is divisibility 
exaetly? What is a whole number? The mathematician 
deals with such questions by precisely defining con- 
cepts (such as divisibility of one number by another), 
basing the definitions on a smallish number of unde- 
fined terms (“whole number” might be one, though it 
is possible to start even further back, with sets). For 
example, one could define a number n to be divisible 
by a number m if and only if there exists an integer q 
such that qm = n. Using this definition, we can give a 
more precise proof: if n is divisible by 6, then n = 6q 
for some q, andtherefore n = 2(3q), which proves that 
n is divisible by 2. Thus we have used the definitions 
to show that the definition of divisibility by 2 holds 
whenever the definition of divisibility by 6 holds. 

Historically, mathematical writers have been satis- 
fied with varying levels of rigor. Results and methods 
have often been widely used without a full justification 
of the kind just outlined, particularly in bodies of math- 
ematical thought that are new and rapidly developing. 
Some ancient cultures, the Egyptians for example, had 
methods for multiplication and division, but no justi- 
fication of these methods has sundved and it does not 
seem especially likely that formal justification existed. 
The methods were probably accepted simply because 
they worked, rather than because there was a thorough 
argument justifying them. 


By the middle of the seventeenth century, European 
mathematical writers who were engaged in research 
were well-acquainted with the model of rigorous math- 
ematical argument supplied by euclid’s [VI.2] Ele- 
ments. The kind of deductive, or synthetic, argument 
we illustrated earlier would have been described as a 
proof more geometrico — in the geometrical way. While 
Euclid’s arguments, assumptions, and definitions are 
not wholly rigorous by today’s standards, the basic idea 
was clear: one proceeds from clear definitions and gen- 
erally agreed basic ideas (such as that the whole is 
greater than the part) to deduce theorems (also called 
propositions) in a step-by-step manner, not bringing 
in anything extra (either on the sly or unintentionally). 
This classical model of geometric argument was widely 
used in reasoning about whole numbers (for example 
by fermat [VI. 12]), in analytic geometry (descartes 
[VI. 11]), and in mechanics (Galileo). 

This article is about rigor in analysis, a term which 
itself has had a shifting meaning. Corning from ancient 
origins, by around 1600 the term was used to refer to 
mathematics in which one worked with an unknown 
(something we would now write as x) to do a calculation 
or find a length. In other words, it was closely related to 
algebra, though the notion was imported into geometry 
by Descartes and others. However, over the course of 
the eighteenth century the word came to be associated 
with the calculus, which was the principal area of appli- 
cation of analytic techniques. When we talk about rigor 
in analysis it is the rigorous theory of the mathematics 
associated with differential and integral calculus that 
we are principally discussing. In the third quarter of 
the seventeenth century rival methods for the differ- 
ential and integral calculus were devised by newton 
[VI. 14] and leibniz [VI.15], who thereby synthesized 
and extended a considerable amount of earlier work 
concerned with tangents and normals to curves and 
with the areas of regions bounded by curves. The tech- 
niques were highly successful, and were extended read- 
ily in a variety of directions, most notably in mechanics 
and in differential equations. 

The key common feature of this research was the use 
of infinities: in some sense, it involved devising meth- 
ods for combining infinitely many infinitely small quan- 
tities to get a finite answer. For example, suppose we 
divide the circumference of a circle into a (large) num- 
ber of equal parts by marking off points at equal dis- 
tances, then joining the points and creating triangles by 
joining the points to the center. Adding up the areas of 
the triangles approximates the circular area, and the 
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more points we use the better the approximation. If 
we imagine infinitely many of these inscribed triangles, 
the area of each will be “infinitely small” or infinitesi- 
mal But because the total involves adding up infimtely 
many of them, it may be that we get a finite posi- 
tive total (rather than just 0, from adding up infinitely 
many zeros, or an infmite number, as we would get 
if we added the same finite number to itself infinitely 
many times). Many techniques for doing such calcula- 
tions were devised, though the interpretation of what 
was taking place varied. Were the infimties involved 
“real” or merely “potential”? If something is “really” 
infinitesimal, is it just zero? Aristotelian writers had 
abhorred actual infinities, and complaints about them 
were common at the time. 

Newton, Leibniz, and their immediate followers pro- 
vided mathematical arguments to justify these meth- 
ods. However, the introduction of techniques involv- 
ing reasoning with infimtely small objects, limiting 
processes, inflnite sums, and so forth meant that 
the founders of the calculus were exploring new 
ground in their arguments, and the comprehensibility 
of these arguments was frequently compromised by 
vague terms, or the drawing of one conclusion when 
another might seem to follow equally well. The objects 
they were discussing included infinitesimals (quantities 
infinitely smaller than those we experience directly), 
ratios of vanishingly small quantities (i.e., fractions 
in, or approaching, the form 0/0), and finite sums of 
infinitely many positive terms. Taylor series represen- 
tations, in particular, provoked a variety of questions. 
A function may be written as a series in such a way 
that the series, when viewed as a function, will have, at 
a given point x = a, the same value as the function, the 
same rate of change (or first derivative), and the same 
higher-order derivatives to arbitrary order: 
f(x) = f(a) + f'(a)(x - a) + \f"(a)(x - a) 2 + ■ ■ ■ . 
For example, sinx = x - x 3 /3! + x 5 /5! + ■ ■ ■ , a faet 
already known to Newton though such series are now 
named after Newton’s disciple brook taylor [VI. 16]. 

One problem with early arguments was that the 
terms being discussed were used in different ways by 
different writers. Other problems arose from this lack 
of clarity, since it concealed a variety of issues. Per- 
haps the most important of these was that an argument 
could fail to work in one context, even though a very 
similar argument worked perfeetly well in another. In 
time, this led to serious problems in extending analysis. 
Eventually, analysis became fully rigorous and these 


difficulties were solved, but the process was a long 
one and it was complete only by the beginning of the 
twentieth century. 

Let us consider some examples of the kinds of dif- 
ficulties that arose from the very beginning, using a 
result of Leibniz. Suppose we have two variables, u 
and v, each of which changes when another variable, 
x, changes. An infinitesimal change in x is denoted dx, 
the differential of x. The differential is an infinitesimal 
quantity, thought of as a geometrical magnitude, such 
as a length, for example. This was imagined to be com- 
bined or compared with other magnitudes in the usual 
ways (two lengths canbe added, have a ratio, and so on). 
When x changes to x + dx, u and v change to u + du 
and v + dv, respectively. Leibniz concluded that the 
product uv would then change to uv + u dv + v du, 
so that d(uv) = udv + v du. His argument is, roughly, 
that d (uv) = (u + d u)(v + dv) - uv. Expanding the 
right-hand side using regular algebra and then simpli- 
fying gives u dv + v du + du dv. But the term du dv 
is a second-order infinitesimal, vanishingly small com- 
pared with the first-order differentials, and is thus 
treated as equal to 0. Indeed, one aspect of the prob- 
lems is that there appears to be an inconsistency in the 
way that infinitesimals are treated. For instance, if you 
want to work out the derivative of y = x 2 , the calcu- 
lation corresponding to the one just given (expanding 
(x + dx) 2 ,and so on) shows that dy/ dx = 2x + dx. We 
then treat the dx on the right-hand side as zero, but 
the one on the left-hand side seems as though it ought 
to be an infinitesimal nonzero quantity, since otherwise 
we could not divide by it. So is it zero or not? And if not, 
how do we get around the apparent inconsistency? 

At a slightly more technical level, the calculus 
required mathematicians to deal repeatedly with the 
“ultimate” values of ratios of the form dy /dx when 
the quantities in both numerator and denominator 
approach or actually reach 0. This phrasing uses, once 
again, the differential notation of Leibniz, though the 
same issues arose for Newton with a slightly different 
notational and conceptual approach. Newton generally 
spoke of variables as depending on time, and he sought 
(for example) the values approached when “evanes- 
cent inerements”— vanishingly small time intervals— 
are considered. One long-standing set of confusions 
arose precisely from this idea that variable quantities 
were in the process of changing, whether with time 
or with changes in the value of another variable. This 
means that we talk about values of a variable approach- 
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ing a given value, but without a clear idea of what this 
“approach” actually is. 

2 Eighteenth-Century Approaches 
and Critiques 

Of course, had the calculus not turned out to be an 
enormously fruitful field of endeavor, no one would 
have bothered to criticize it. But the methods of New- 
ton and Leibniz were widely adopted for the solution 
of problems that had interested earlier generations 
(notably tangent and area problems) and for the pos- 
ing and solution of problems that these techniques 
suddenly made far more accessible. Problems of areas, 
maxima and minima, the formulation and solution of 
differential equations to describe the shape of hanging 
chains or the positions of points on vibrating strings, 
applications to celestial mechanics, the investigation of 
problems having to do with the properties of functions 
(thought of for the most part as analytic expressions 
involving variable quantities)— all these helds and more 
were developed over the course of the eighteenth cen- 
tury by such individuals as Taylor, Johann and Daniel 
Bernoulli, Euler, d’alembert [VI.20], Lagrange, and 
many others. These people employed many virtuoso 
arguments of suspect validity. Operations with diver- 
gent series, the use of imaginary numbers, and manip- 
ulations involving actual inhnities were used effectively 
in the hånds of the most capable of these writers. How- 
ever, the methods could not always be explained to 
the less capable, and thus certain results were not reli- 
ably reproducible— a very odd State for mathematics 
from today’s standpoint. To do Euler’s calculations, one 
needed to be Euler. This was a situation that persisted 
well into the following century. 

Specihc controversies often highlighted issues that 
we now see as a result of foundational confusion. In the 
case of inhnite series, for example, there was confusion 
about the domain of validity of formal expressions. 
Consider the series 

1 — 1 + 1 — 1+1 — 1 + 1 . 

In today’s usual elementary definition (due to cauchy 
[VI.29] around 1820) we would now consider this series 
to be divergent because the sequence of partial sums 
1, 0, 1, 0, . . . does not tend to a limit. But in faet there 
was some controversy about the actual meaning of such 
expressions. Euler and Nicholas Bernoulli, for example, 
discussed the potential distinetion between the sum 
and the value of an infinite sum, Bernoulli arguing that 
something like 1-2 + 6-24+120+ ■ ■ ■ hasno sum but 


that this algebraic expression does constitute a value. 
Whatever may have been meant by this, Euler defended 
the notion that the sum of the series is the value of 
the finite expression that gives rise to the series. In 
his 1755 Institutiones Calculi Differentialis, he gives the 
example of 1 - x + x 2 - x 3 + ■■■ , which comes from 
1/(1 + x), and later defended the view that this meant 
that 1 — 1 + 1 — 1 + -- - = |. His view was not uni- 
versally accepted. Similar controversies arose in con- 
sidering how to extend the values of functions outside 
their usual domain, for example with the logarithms of 
negative numbers. 

Probably the most famous eighteenth-century cri- 
tique of the language and methods of eighteenth-cen- 
tury analysis is due to the philosopher George Berke- 
ley (1685-1753). Berkeley’s motto, “To be is to be per- 
ceived,” expresses his idealist stance, which was cou- 
pled with a strong view that the abstraction of individ- 
ual qualities, for the purposes of philosophical discus- 
sion, is impossible. The objects of philosophy should 
thus be things that are perceived, and perceived in 
their entirety. The impossibility of perceiving infinites- 
imally small objects, combined with their manifestly 
abstracted nature, led him to attack their use in his 
1734 treatise The Analyst: Or, a Discourse Addressed 
to an Infidel Mathematician. Referring sarcastically in 
1734 to infinite simals as the “ghosts of departed quan- 
tities,” Berkeley argued that neglecting some quantity, 
no matter how small, was inappropriate in mathemati- 
cal argument. He quoted Newton in this regard, to the 
effeet that “in mathematical matters, errors are to be 
condemned, no matter how small.” Berkeley continued, 
saying that “[njothing but the obscurity of the sub- 
ject” could have induced Newton to impose this kind 
of reasoning on his followers. Such remarks, while they 
apparently did not dissuade those enamored of the 
methods, contributed to a sentiment that aspects of the 
calculus required deeper explanation. Writers such as 
Euler, d’Alembert, Lazare Carnot, and others attempted 
to address foundational criticisms by clarifying what 
differentials were, and gave a variety of arguments to 
justify the operations of the calculus. 

2.1 Euler 

Euler contributed to the general development of analy- 
sis more than any other individual in the eighteenth 
century, and his approaches to justifying his arguments 
were enormously influential even after his death, owing 
to the success and wide use of his important textbooks. 
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Euler’s reasoning is sometimes regarded as rather care- 
less since he operated rather freely with the notation of 
the calculus, and many of his arguments are certainly 
deficient by later standards. This is particularly true 
of arguments involving infinite series and products. A 
typical example is provided by an early version of his 
proof that 



His method is as follows. Using the known series ex- 
pansion for sin x he considered the zeros of 
sinVx _ , x x 2 x 3 
sfx ~ 3! + "sT “ 7T + " ' ' 

These lie at tt 2 , (2tt) 2 , (3tt) 2 , Applying (with- 

out argument) the factor theorem for finite algebraic 
equations he expressed this equation as 

Now, it can be seen that the coefficient of x in the infi- 
nite sum, —g, should equal the negative of the sum 
of the coefficients of x in the product. Euler appar- 
ently concluded this by imagining multiplying out the 
infinitely many terms and selecting the 1 from all but 
one of them. This gives 

J_ _j_ _j_ _ 1 

\jriy\4n 2 ' Vn 2 + 6’ 

and multiplying both sides by tt 2 gives the required 
sum. 

We now think of this approach as having several 
problems. The product of the infinitely many terms 
may or may not represent a finite value, and today 
we would specify conditions for when it does. Also, 
applying a result about (finite) polynomials to (infi- 
nite) power series is a step that requires justification. 
Euler himself was to provide alternative arguments 
for this result later in his life. But the faet that he 
may have known counterexamples— situations in which 
such usages would not work— was not, for him, a deci- 
sive obstacle. This view, in which one reasoned in a 
generic situation that might admit a few exceptions, 
was common at his time, and it was only in the late 
nineteenth century that a concerted effort was made to 
State the results of analysis in ways that set out pre- 
cisely the conditions under which the theorems would 
hold. 

Euler did not dwell on the interpretation of infinite 
sums or infinitesimals. Sometimes he was happy to 
regard differentials as actually equal to zero, and to 


derive the meaning of a ratio of differentials from the 
context of the problem: 

An infinitely small quantity is nothing but a vanish- 
ing quantity and therefore will be actually equal to 
0. . . . Hence there are not so many mysteries hidden in 
this concept as there are usually believed to be. These 
supposed mysteries have rendered the calculus of the 
infinitely small quite suspect to many people. 

This statement, from the Institutiones Calculi Differen- 
tialis of 1755, was followed by a discussion of propor- 
tions in which one of the ratios is 0/0, and a justifi- 
cation of the faet that differentials may be neglected 
in calculations with ordinary numbers. This accurately 
describes a good deal of his practice— when he worked 
with differential equations, for example. 

Controversial matters did arise, however, and 
debates about definitions were not unusual. The best- 
known example involves discussions connected with 
the so-called vibrating string problem, which involved 
Euler, d’Alembert, and Daniel Bernoulli. These were 
closely connected with the definition of functions 
[1.2 §2.2], and the question of which functions studied 
by analysis actually could be represented by series (in 
particular trigonometric series). The idea that a curve 
of arbitrary shape could serve as an initial position for 
a vibrating string extended the idea of funetion, and 
the work of fourier [VI.25] in the early nineteenth 
century made such functions analytically accessible. In 
this context, functions with broken graphs (a kind of 
discontinuous funetion) came under inspection. Later, 
how to deal with such functions would be a decisive 
issue for the foundations of analysis, as the more “nat- 
ural” objects associated with algebraic operations and 
trigonometry gave way to the more general modern 
concept of funetion. 

2.2 Responses from the Late Eighteenth Century 

One significant response to Berkeley in Britain was that 
of Colin Maclaurin (1698-1746), whose 1742 textbook 
A Treatise of Fluxions attempted to clarify the foun- 
dations of the calculus and do away with the idea 
of infinitely small quantities. Maclaurin, a leading fig- 
ure of the Scottish Enlightenment of the mid eigh- 
teenth century, was the most distinguished British 
mathematician of his time and an ardent proponent 
of Newton’s methods. His work, unlike that of many 
of his British contemporaries, was read with interest 
on the Continent, especially his elaborations of Newto- 
nian celestial mechanics. Maclaurin attempted to base 



122 


II. The Origins of Modem Mathematics 


his reasoning on the notion of the limits of what he 
termed “assignable” finite quantities. Maclaurin’s work 
is famously obscure, though it did provide examples of 
calculating the limits of ratios. Perhaps his most impor- 
tant contribution to the clarification of the foundations 
of analysis was his influence on d’Alembert. 

D’Alembert had read both Berkeley and Maclaurin 
and followed them in rejecting infinitesimals as real 
quantities. While exploring the idea of a differential as 
a limit, he also attempted to reconcile his idea with the 
idea that infinitesimals may be consistently regarded 
as being actually zero, perhaps in a nod to Euler’s 
view. The main exposition of d’Alembert’s views may 
be found in the Encyclopédie, in the articles on dif- 
ferentials (published in 1754) and on limits (1765). 
D’Alembert argued for the importance of geometric 
rather than algebraic limits. His meaning seems to have 
been that the quantities being investigated should not 
be treated merely formally, by substitution and sim- 
plification. Rather, a limit should be understood as the 
limit of a length (or collection of lengths), area, or other 
dimensioned quantity, in much the way that a circle 
may be seen as a limit of inscribed polygons. His aim 
seems primarily to have been to establish the reality 
of the objects described by existing algorithms, since 
the actual calculations he employs are carried out with 
differentials. 

2.2.1 Lagrange 

In the course of the eighteenth century, the differential 
and the integral calculus gradually distinguished them- 
selves as a set of methods distinet from their applica- 
tions in mechanics and physics. At the same time, the 
primary focus of the methods moved away from geom- 
etry, so that in work of the second half of the eighteenth 
century we increasingly see calculus treated as “alge- 
braic analysis” of “analytic funetions.” The term “ana- 
lytic” was used in a variety of senses. For many writers, 
such as Euler, it merely referred to a funetion (that is, a 
relationship between variable quantities) that is given 
by a single expression of the type used in analysis. 

Lagrange provided a foundation for the calculus that 
was indebted to this algebraic viewpoint. Lagrange 
concentrated on power-series expansions as the basic 
entity of analysis, and through his work the term ana- 
lytic funetion evolved toward its more recent mean- 
ing connected with the existence of a convergent Tay- 
lor series representation. His approach reached a full 
expression in his Théorie des Fonetions Analytiques of 


1797. This was a version of his leetures at the École 
Polytechnique, a new institution for the elite training 
of military engineers in revolutionary France. Lagrange 
assumed that a funetion must necessarily be express- 
ible as an infinite series of algebraic funetions, bas- 
ing this argument on the existence of expansions for 
known funetions. He first sought to show that “in gen- 
eral” no negative or fractional powers would appear 
in the expansion, and from this he obtained a power- 
series representation. His arguments here are surpris- 
ing, and somewhat ad hoc, and I use an example given 
by Fraser (1987). The slightly strange notation is based 
on that of Lagrange. Suppose that one seeks an expan- 
sion of fix) =• ^#'4 i in powers of i. In general, only 
integer powers will be involved. Terms of the form i m, n 
do not make sense, says Lagrange, since the expression 
of the funetion Vx + i is only two-valued, while i m/n 
has n values. Hence the series 

Vx + i = fx + pi + qi 2 +■■■+ ti k +■■ ■ 
obtains its two values from the term fx, and all other 
powers must be integral. With fractional exponents set 
aside, Lagrange argued that /(x + i) = f(x) + i a P(x, i), 
with P finite for i = 0. Successive application of this 
result gave him the expansion 

fix + i) = fix) + pi + qi 2 + ri 3 + ■ ■ ■ , 
where i was a small inerement. The number p depends 
on x, so Lagrange defined a derived funetion fix) = 
pix). The French term dérivée is the origin of the term 
derivative, and in Lagrange’s language / is the “prim- 
itive” of this derived funetion. Similar arguments can 
be made to relate the higher coefficients to the higher 
derivatives in the usual Taylor formula. 

This approach, which seems oddly circular to mod- 
ern eyes, relied on the eighteenth-century distinetion 
between the “algebraic” infinite process of the series 
expansion on the one hånd, and the use of differen- 
tials on the other. Lagrange did not see the original 
series expansion as based on the limit process. With 
the renewed emphasis on limits and modern defini- 
tions developed by Cauchy, this approach was soon to 
be regarded as untenable. 

3 The First Half of the Nineteenth Century 
3.1 Cauchy 

Many writers contributed to discussions on rigor in 
analysis in the first decades of the nineteenth century. 
It was Cauchy who was to revive the limit approach to 
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greatest effect. His aim was pedagogical, and his ideas 
were probably worked out in the context of preparing 
his introductory leetures for the École Polytechnique at 
the beginning of the 1820s. Although the students were 
the hest in France in scholarly ability, many found the 
approach too difficult. As a result, while Cauchy him- 
self continued to use his methods, other instructors 
held on to older approaches using infinitesimals, which 
they found more intuitively accessible for the students 
as well as better adapted to the solution of problems 
in elementary mechanics. Cauchy’s self-imposed exile 
from Paris in the 1830s further limited the impact of 
his approach, which was initially taken up only by a 
few of his students. 

Nonetheless, Cauchy’s definitions of limit, of conti- 
nuity, and of the derivative gradually came into gen- 
eral use in France, and were influential elsewhere as 
well, especially in Italy. Moreover, his methods of using 
these definitions in proofs, and particularly his use of 
mean-value theorems in various forms, moved analysis 
from a collection of symbolic manipulations of quanti- 
ties with special properties toward the science of argu- 
ment about infinite processes using close estimation 
via the manipulation of inequalities. 

In some respects, Cauchy’s greatest contribution lay 
in his clear definitions. For earlier writers, the sum 
of an infinite series was a somewhat vague notion, 
sometimes interpreted by a kind of convergence argu- 
ment (as with the sum of a geometric series such as 
Xn= o 2_n ) ar >d sometimes as the value of the funetion 
from which the series was derived (as Euler, for exam- 
ple, often regarded it). Cauchy revised the definition to 
State that the sum of an infinite series was the limit 
of the sequence of partial sums. This provided a uni- 
fied approach for series of numbers and series of func- 
tions, an important step in the move to base calculus 
and analysis on ideas about real numbers. This trend, 
eventually dominant, is often referred to as the “arith- 
metization of analysis.” Similarly, a continuous fune- 
tion is one for which “an infinitely small inerease of 
the variable produces an infinitely small inerease of the 
funetion itself” (Cauchy 1821, pp. 34-35). 

As we see from the example just given, Cauchy did 
not shy away from infinitely small quantities, nor did 
he analyze this notion further. The limit of a variable 
quantity is defined in a way that we would now regard 
as conversational, or heuristic: 

When the values that are successively assigned to a 

given variable approach a fixed value mdefinitely, in 


such a way that it ends up differing from it as little as 
one wishes, this latter value is called the limit of all the 
others. Thus, for example, an irrational number is the 
limit of the various fractions that provide values that 
are doser and doser to it. 

Cauchy (1821, p. 4) 


These ideas were not completely rigorous by modern 
standards, but he was able to use them to provide a 
unified foundation for the basic processes of analysis. 

This use of infinitely small quantities appears, for 
example, in his definition of a continuous funetion. To 
paraphrase his definition, suppose that a funetion fix) 
is single-valued on some finite interval of the real line, 
and choose any value xo inside the interval. If the value 
of xo is inereased to xo + a, the funetion also changes 
by the amount /(x o + a) - f(x o). Cauchy says that 
the funetion / is continuous for this interval if, for 
each value of xo in that interval, the numerical value of 
the difference /(x o + a) - f(x o) decreases mdefinitely 
to 0 with a. In other words, Cauchy defines continu- 
ity as a property on an interval rather than at a point, 
in essence by saying that on that interval infinitely 
small changes in the argument produce infinitely small 
changes in the funetion value. Cauchy appears to have 
considered continuity to be a property of a funetion on 
an interval. 

This definition emphasizes the importance of jumps 
in the value of the funetion for the understanding of 
its properties, something that Cauchy had encountered 
early in his career when discussing the fundamental 
theorem of calculus [1.3 §5.5]. In his 1814 memoir 
on definite integrals, Cauchy stated: 

If the funetion <p(z) inereases or decreases in a con- 
tinuous manner between z = V and z = b", the 
value of the integral [jy, 4>' ( z ) d z] will ordinarily be 
represented by 4>(b") - 4>(b')- But if...the funetion 
passes suddenly from one value to another sensibly 
different . . . the ordinary value of the integral must be 
diminished. 

Oeuvres (volume I, pp. 402-3) 


In his leetures, Cauchy assumed continuity when 
defining the definite integral. He considered first of all 
a division of the interval of integration into a finite 
number of subintervals on which the funetion is either 
inereasing or decreasing. (This is not possible for all 
funetions, but this appeared not to concern Cauchy.) 
He then defined the definite integral as the limit of the 
sum S = (xi - x 0 )/(xo) + (x 2 - xi)/(xi) + ■■■ + 
(X — x n -i)f(x n -\) as the number n becomes very 
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large. Cauchy gives a detailed argument for the exis- 
tence of this limit, using his theorem of the mean and 
the faet of continuity. 

Versions of the main subjects of Cauchy’s leetures 
were published in 1821 and 1823. Every student at the 
École Polytechnique would have been aware of them 
subsequently, and many would have used them explic- 
itly. They were joined in 1 84 1 by a version of the course 
elaborated by Cauchy’s associate, the Abbé Moigno. 
They were ref erred to frequently in France and the def- 
initions employed by Cauchy became standard there. 
We also know that the leetures were studied by oth- 
ers, notably by abel [VI.33] and dirichlet [VI.36], who 
spent time in Paris in the 1820s, and by riemann 
[VT.49]. 

Cauchy’s movement away from the formal approach 
of Lagrange rejected the “vagueness of algebra.” Al- 
though he was clearly guided by intuition (both geo- 
metric and otherwise), he was well aware that intu- 
ition could be misleading, and produced examples to 
show the value of adhering to precise definitions. One 
famous example, the funetion that takes the value 
e- 1 /* 2 when x * 0 and zero when x = 0, is differ- 
entiable infinitely many times, yet it does not yield 
a Taylor series that converges to the funetion at the 
origin. Despite this example, which he mentioned in 
his leetures, Cauchy was not a specialist in counter- 
examples, and in faet the trend toward producing 
counterexamples for the purpose of clarifying defini- 
tions was a later development. 

Abel famously drew attention to an error in Cauchy’s 
work: his statement that a convergent series of contin- 
uous funetions has a continuous sum. For this to be 
true, the series must be uniformly convergent, and in 
1826 Abel gave as a counterexample the series 

k=i K 

which is discontinuous at odd multiples of tt. Cauchy 
was led to make this distinetion only mueh later, after 
the phenomenon had been identified by several writers. 
Historians have written extensively about this apparent 
error; one influential account, due to Bottazzini, pro- 
poses that for various reasons Cauchy would not have 
found Abel’s example telling, even if he had known of 
it at the time (Bottazzini 1990, p. FXXXV). 

Before leaving the time of Cauchy, we should note 
the related independent activity of bolzano [VI. 2 8]. 
Bolzano, a Bohemian priest and professor whose ideas 
were not widely disseminated at the time, investigated 


the foundations of the calculus extensively. In 1817, 
for example, he gave what he termed a “purely ana- 
lytic proof of the theorem that between any two values 
that possess opposite signs, at least one real root of 
the equation exists”: the intermediate value theorem. 
Bolzano also studied infinite sets: what is now called 
the Bolzano-Weierstrass theorem States that in every 
bounded infinite set there is at least one point having 
the property that any disk about that point contains 
infinitely many points of the set. Such “limit points” 
were studied independently by weierstrass [VI.44]. 
By the 1870s, Bolzano’s work became more broadly 
known. 

3.2 Riemann, the Integral, and Counterexamples 

Riemann is indelibly associated with the foundations of 
analysis because of the Riemann integral, which is part 
of every calculus course. Despite this, he was not always 
driven by issues involving rigor. Indeed he remains a 
standard example of the fruitfulness of nonrigorous 
intuitive invention. There are many points in Riemann’s 
work at which issues about rigor arise naturally, and 
the wide interest in his innovations did mueh to direct 
the attention of researchers to making these insights 
precise. 

Riemann’s definition of the definite integral was pre- 
sented in his 1854 Habil i ta tio nsch rift— the “second the- 
sis,” which qualified him to leeture at a university for 
fees. He generalized Cauchy’s notion to funetions that 
are not necessarily continuous. He did this as part of an 
investigation of fourier series [III.27] expansions. The 
extensive theory of such series was devised by Fourier 
in 1807 but not published until the 1820s. A Fourier 
series represents a funetion in the form 

/(x) = ao + X («n cos(nx) + b n sin(nx)) 
on a finite interval. 

The immediate inspiration for Riemann’s work was 
dirichlet [VI.36], who had corrected and developed 
earlier faulty work by Cauchy on the question of when 
and whether the Fourier series expansion of a funetion 
converges to the funetion from which it is derived. In 
1829 Dirichlet had succeeded in proving such conver- 
gence for a funetion with period 2tt that is integrable 
on an interval of that length, does not possess infinitely 
many maxima and minima there, and at jump discon- 
tinuities takes on the average value between the two 
limiting values on each side. As Riemann noted, follow- 
ing his professor Dirichlet, “this subject stands in the 
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closest connection to the principles of infinitesimal cal- 
culus, and can therefore serve to bring these to greater 
clarity and definiteness” (Riemann 1854, p. 238). Rie- 
mann sought to extend Dirichlet’s investigations to fur- 
ther cases, and was thus led to investigate in detail 
each of the conditions given by Dirichlet. Accordingly, 
he generalized the definition of a definite integral as 
follows: 

We take between a and b an increasing sequence of 
values xi, X2, ■ ■ ■ , x n -i , and for brevity designate xi-a 

by <5i, x 2 - xi by S 2 b - x n -i by S n and by e a 

positive proper fraction. Then the value of the sum 

S = Sif(a + eidi) + <5 2 /(x 1 + e 2 <5 2 ) 

+ S 3 f(X2 + e 363 ) + ■ ■ ■ + SnfiXn - 1 + e„8 n ) 
depends on the choice of the intervals S and the quanti- 
ties e. If it has the property that it approaches infinitely 
closely a fixed limit A no matter how the 8 and e are 
chosen, as 8 becomes infmitely small, then we cail this 
value f(x) dx. 

In connection with this definition of the integral, and 
in part to show its power, Riemann provided an exam- 
ple of a function that is discontinuous in any inter- 
val, yet can be integrated. The integral thus has points 
of nondifferentiability on each interval. Riemann’s def- 
inition rendered problematic the inverse relationship 
between differentiation and integration, and his exam- 
ple brought this problem out clearly. The role of such 
“pathological” counterexamples in pushing the devel- 
opment of rigor, already apparent in Cauchy’s work, 
intensified greatly around this time. 

Riemann’s definition was published only in 1867, fol- 
lowing his death; an expository version due to Gaston 
Darboux appeared in French in 1873. The populariza- 
tion and extension of Riemann’s approach went hånd 
in hånd with the increasing appreciation of the impor- 
tance of rigor associated with the Weierstrass school, 
discussed below. Riemann’s approach focused atten- 
tion on sets of points of discontinuities, and thus were 
seminal for cantor’s [VI.54] investigations into point 
sets in the 1870s and afterwards. 

The use of the Dirichlet principle serves as a fur- 
ther example of the way in which Riemann’s work drew 
attention to problems in the foundations of analysis. 
In connection with his research into complex analy- 
sis, Riemann was led to investigate solutions to the 
so-called Dirichlet problem: given a function g, defined 
on the boundary of a closed region in the plane, does 
there exist a function / that satisfies the laplace 


PARTIAL DIFFERENTIAL EQUATION [1.3 §5.4] in the inte- 
rior and takes the same values as g on the bound- 
ary? Riemann asserted that the answer was yes. To 
demonstrate this, he reduced the question to prov- 
ing the existence of a function that minimizes a cer- 
tain integral over the region, and argued on physical 
grounds that such a minimizing function must always 
exist. Even before Riemann’s death his assertion was 
questioned by weierstrass [VI.44], who published a 
counterexample in 1870. This led to attempts to refor- 
mulate Riemann’s results and prove them by other 
means, and ultimately to a rehabilitation of the Dirich- 
let principle through the provision of precise and broad 
hypotheses for its validity, which were expressed by 
hilbert [VI.63] in 1900. 

4 Weierstrass and His School 

Weierstrass had a passion for mathematics as a stu- 
dent at Bonn and Minister, but his student career was 
very uneven. He spent the years from 1840 to 1856 
as a high-school teacher, undertaking research inde- 
pendently but at first publishing obscurely. Papers 
from 1854 onward in Journal fur die reine und ange- 
wandte Mathematik (otherwise known as Crelle’s Jour- 
nal) attracted wide attention to his talent, and he 
obtained a professorship in Berlin in 1856. Weierstrass 
began to leeture regularly on mathematical analysis, 
and his approach developed into a series of four 
courses of leetures given cyclically between the early 
1860s and 1890. The leetures evolved over time and 
were attended by a large number of important math- 
ematical researchers. They also indireetly influenced 
many others through the circulation of unpublished 
notes. This circle included R. Iipschitz, P. du Bois- 
Reymond, H. A. Schwarz, O. Holder, Cantor, L. Koenigs- 
berger, G. Mittag-Leffler, kovalevskaya [VI. 5 9], and 
L. Fuchs, to name only some of the most important. 
Through their use of Weierstrassian approaches in their 
own research, and their espousal of his ideas in their 
own leetures, these approaches became widely used 
well before the eventual publication of a version of 
his leetures late in his life. The account that follows 
is based largely on the 1878 version of the leetures. His 
approach was also influential outside Germany: parts of 
it were absorbed in France in the leetures of hermite 
[VI.47] and jordan [VI.52], for example. 

Weierstrass’s approach builds on that of Cauchy 
(though the detailed relationship between the two bod- 
ies of work has never been fully examined). The two 
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overarching themes of Weierstrass’s approach are, on 
the one hånd, the banning of the idea of motion, or 
changing values of a variable, from limit processes, and, 
on the other, the representation of functions, notably 
of a complex variable. The two are intimately linked. 
Essential to the motion-free definition of a limit is 
Weierstrass’s nascent investigation of what we would 
now call the topology of the real line or complex plane, 
with the idea of a limit point, and a clear distinction 
between local and global behavior. The central objects 
of study for Weierstrass are functions (of one or more 
real or complex variable quantities), but it should be 
borne in mind that set theory is not involved, so that 
functions are not to be thought of as sets of ordered 
pairs. 

The leetures begin with a now-familiar subject: the 
development of rational, negative, and real numbers 
from the integers. For example, negative numbers are 
defined operationally by making the integers closed 
under the operation of subtraction. He attempted a 
unihed approach to the definition of rational and irra- 
tional numbers which involved unit fractions and dec- 
imal expansions and now seems somewhat murky. 
While Weierstrass’s definition of the real numbers 
appears unsatisfactory to modern eyes, the general 
path of arithmetization of analysis was established 
by this approach. In parallel to the development of 
number systems, he also developed different classes 
of functions, building them up from rational func- 
tions by using power-series representations. Thus, in 
Weierstrass’s approach, a polynomial (called an inte- 
ger rational funetion) is generalized to a “funetion of 
integer character,” which means a funetion with a con- 
vergent power series expansion everywhere. The Weier- 
strass factorization theorem asserts that any such 
funetion may be written as a (possibly infinite) product 
of certain “prime” functions and exponential functions 
with polynomial exponents of a certain type. 

The limit definition given by Weierstrass has thor- 
oughly modern features: 

That a variable quantity x becomes infinitely small 
simultaneously with another quantity y means: “After 
the assumption of an arbitrarily small quantity e a 
bound 5 for x may be found, such that for every value 
of x for which |x| <5, the corresponding value of \y\ 
willbe less than e.” 

Weierstrass (1988, p. 57) 

Weierstrass immediately used this definition to give 
a proof of continuity for rational functions of sev- 


eral variables, using an argument that could appear 
in a textbook today. The former notions of variables 
tending to given values were replaced by quantified 
statements about linked inequalities. The framing of 
hypotheses in terms of inequalities became a guiding 
motif in the work of Weierstrass's school: here we men- 
tion in passing the Lipschitz and Holder conditions in 
the existence theory for differential equations. The clar- 
ity that this language gave to problems involving the 
interchange of limits, for example, meant that previ- 
ously intractable problems could now be håndled in 
a routine way by those inculcated in the Weierstrass 
approach. 

The faet that general functions were built from 
rational functions using series expansions gave the lat- 
ter a key role in Weierstrass’s work, and as early as 
1841 he had identified the importance of uniform con- 
vergence. The distinction between uniform and point- 
wise convergence was made very clearly in his lee- 
tures. A series converges, as it does for Cauchy, if 
its sequence of partial sums converges, though now 
the convergence is phrased in the following terms: the 
series XfnM converges to 5o at x = xo if, given an 
arbitrary positive e, there is an integer N such that 
Uo - (/i(x 0 ) + fi (xo) + ■ ■ ■ + fn (x 0 ) ) | < e for every 
n> N. The convergence is uniform on a domain of the 
variable if the same N will work for that e value for all 
x in the domain. Uniform convergence guarantees con- 
tinuity of the sum, since these are series of rational, 
hence continuous, functions. From this point of view, 
then, uniform convergence is important well beyond 
the context of trigonometric series (important though 
those may be). Indeed, it is a central tool of the entire 
theory of functions. 

Weierstrass’s role as a critic of rigor in the work of 
others, notably Riemann, has already been noted. More 
than any other leading figure, he generated counter- 
examples to illustrate difficulties with received notions 
and to distinguish between different kinds of analyti- 
cal behavior. One of his best-known examples was of 
an everywhere-continuous but nowhere-differentiable 
funetion, namely /(x) = X b™ cos (a n x), which is uni- 
formly convergent for b < 1 but fails to be differ- 
entiable at any x if ab > 1 + |tt. Similarly he con- 
structed functions for which the Dirichlet principle 
fails, examples of sets constituting “natural bound- 
aries,” that is, obstacles to continuing series expan- 
sions into larger domains, and so forth. The careful 
distinetions he encouraged, and the very procedure 
of seeking pathological rather than typical examples, 
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threw the spotlight on the precision of hypotheses in 
analysis to an unprecedented degree. From the 1880s, 
with the maturity of this program, analysis no longer 
dealt with generic cases and looked instead for abso- 
lutely precise statements in a way that has for the most 
part endured to the present. This was also to become 
a pattern and an imperative in other areas of mathe- 
matics, though sometimes the passage from reasoning 
from generic examples to fully expressed hypotheses 
and definitions took decades. (Algebraic geometry pro- 
vides a famous example, one in which reasoning with 
generic cases lasted until the 1920s.) In this sense the 
form of rigorous argument and exposition espoused by 
Weierstrass and his school was to become a pattern for 
mathematics generally. 

4.1 The Aftermath of Weierstrass and Riemann 

Analysis became the model subdiscipline for rigor for 
a variety of reasons. Of course, analysis was important 
for the sheer volume and range of apphcation of its 
results. Not everyone agreed with the precise way in 
which Weierstrass approached foundational questions 
(through series, rational functions, and so on). Indeed, 
Riemann’ s more geometric approach had attracted 
followers, if not exactly a school, and the insights 
his approach afforded were enthusiastically embraced. 
However, any subsequent discussion had to take place 
at a level of rigor comparable to that which Weierstrass 
had attained. While approaches to the foundations of 
analysis were to vary, the idea that limits should be rig- 
orously håndled in much the way that Weierstrass did 
was not to alter. Among the remaining central issues 
for rigor was the definition of the number systems. 

For the real numbers, probably the most success- 
ful definition (in terms of its later use) was provided 
by dedekind [VI. 50]. Dedekind, like Weierstrass, took 
the integers as fundamental, and extended them to the 
rationals, noting that the algebraic properties satisfied 
by the latter are those satisfied by what we now call a 
field [1.3 §2.2]. (This idea is also Dedekind’s.) He then 
showed that the rational numbers satisfy a trichotomy 
law. That is, each rational number x divides the entire 
collection into three parts: x itself, rational numbers 
greater than x, and rational numbers less than x. He 
also showed that the rationals greater and less than a 
given number extend to infinity, and that any rational 
corresponds to a distinet point on the number line. 
However, he also observed that along that line there 
are infinitely many points that do not correspond to 
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any rational. Using the idea that to every point on the 
line there should correspond a number, he constructed 
the remainder of the continuum (that is, the real line) 
by the use of cuts. These are ordered pairs (Ai, A?) of 
nonempty sets of rational numbers such that every ele- 
ment of the first set is less than every element of the 
second, and such that taken together they contain all 
the rationals. Such cuts may obviously be produced by 
an element x, in which case x is either the greatest ele- 
ment of Ai or the least element of A2. But sometimes 
Ai does not have a greatest element, or A2 a least ele- 
ment, and in that case we can use the cut to define a 
new number, which is necessarily irrational. The set of 
all such cuts may be shown to correspond to the points 
of the number line, so that nothing is left out. A critical 
reader might feel that this is begging the question, since 
the idea of the number line constituting a continuum 
in some way might seem to be a hidden premise. 

Dedekind's construction stimulated a good deal of 
discussion, especially in Germany, about the best 
way to found the real numbers. Participants included 
E. Heine, Cantor, and the logician frege [VI. 56]. Heine 
and Cantor, for example, considered real numbers as 
equivalence classes of Cauchy sequences of rationals, 
together with a machinery that permitted them to 
define the basic arithmetical operations. A very simi- 
lar approach was proposed by the French mathemati- 
cian Charles Méray. Frege, by contrast, in his 1884 Die 
Grundlagen der Arithmetik, sought to found the inte- 
gers on logic. While his attempts to construct the reals 
along these lines did not bear fruit, he had an impor- 
tant role in his insistence that the various construc- 
tions should not merely be mathematically funetional 
but should also be demonstrably free from internal 
contradiction. 

Despite much activity on the foundations of the 
real numbers, infinite sets, and other basic notions for 
analysis, consensus remained elusive. For example, the 
influential Berlin mathematician Leopold kronecker 
[VI.48] denied the existence of the reals, and held that 
all true mathematics was to be based on finite sets. Like 
Weierstrass, with whom he worked and whom he influ- 
enced, he emphasized the strong analogies between the 
integers and the polynomials, and sought to use this 
algebraic foundation to build all of mathematics. Hence 
for Kronecker the entire main path of research in analy- 
sis was anathema, and he opposed it ardently. These 
views were influential, both direetly and indireetly, on 
a number of later writers, including brouwer [VI.75], 
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the intuitionist school around him, and the algebraist 
and number theorist Kurt Hensel. 

All efforts to found analysis were based in one way 
or another on an underlying notion (not always made 
explicit) of quantity. The foundational framework of 
analysis, however, was to shift over the period from 
1880 to 1910 toward the theory of sets. This had its 
origin in the work of Cantor, a student of Weierstrass 
who began studying discontinuities of Fourier series in 
the early 1870s. Cantor became concerned about how 
to distinguish between different types of infinite sets. 
His proofs that the rational numbers and the algebraic 
numbers are countable [III. 11] while the reals are not 
led him to a hierarchy of infinite sets of different car- 
dinality. The importance of this discovery for analysis 
was at first not widely recognized, though in the 1880s 
Mittag-Leffler and Hurwitz both made significant appli- 
cations of notions about derived sets (the set of limit 
points of a given set) and dense or nowhere-dense sets. 

Cantor gradually came to the view that set theory 
could function as a foundational tool for all of math- 
ematics. As early as 1882 he wrote that the science 
of sets encompassed arithmetic, function theory, and 
geometry, combining them into a “higher unity” based 
on the idea of cardinality. However, this proposal was 
vaguely articulated and at first attracted no adherents. 
Nonetheless, sets began to find their way into the lan- 
guage of analysis, most notably through ideas of mea- 
sure [III. 5 7] and measurability of a set. Indeed, one 
important route to the absorption of analysis by set 
theory was the path that sought to determine what kind 
of function could “measure” a set in an abstract sense. 
The work of lebesgue [VI.72] and borel [VI.70] around 
1900 on integration and measurability tied set theory 
to the calculus in a very concrete and intimate way. 

A further key step in the establishment of the foun- 
dations of analysis in the early twentieth century was 
a new emphasis on mathematical theories as axiomatic 
structures. This received enormous impetus from the 
work of Hilbert, who, beginning in the 1890s, had 
sought to provide a renewed axiomatization of geom- 
etry. peano [VI.62] in Italy headed a school with simi- 
lar aims. Hilbert redefined the reals on these axiomatic 
grounds, and his many students and associates turned 
to axiomatics with enthusiasm for the clarity the 
approach could provide. Rather than proving the exis- 
tence of specific entities such as the reals, the math- 
ematician posits a system satisfying the fundamental 
properties they possess. A real number (or whatever 
object) is then defined by the set of axioms provided. 


As Epple has pointed out, such definitions were con- 
sidered to be ontologically neutral in that they did not 
provide methods for telling real numbers from other 
objects, or even State whether they existed at all (Epple 
2003, p. 316). Hilbert’s student Ernst Zermelo began 
work on axiomatizing set theory along these lines, pub- 
lishing his axioms in 1908 (see [IV.l §3]). Problems with 
set theory had emerged in the form of paradoxes, the 
most famous due to russell [VI. 71]: if S is the set of 
all sets that do not contain themselves, then it is not 
possible for S to be in S, nor can it not be in S. Zer- 
melo’s axiomatics sought to avoid this difficulty, in part 
by avoiding the definition of set. By 1910, weyl [VI.80] 
was to refer to mathematics as the science of “g,” or 
set membership, rather than the science of quantity. 
Nonetheless, Zermelo’s axioms as a foundational strat- 
egy were contested. For one thing, a consistency proof 
for the axioms was lacking. Such “meaning-free” axiom- 
atization was also contested on the grounds that it 
removed intuition from the picture. 

Against the complex and rapidly developing back- 
ground of mathematics in the early twentieth century, 
these debates took on many dimensions that have 
implications well beyond the question of what consti- 
tutes rigorous argument in analysis. For the practicing 
analyst, however, as well as for the teacher of basic 
infmitesimal calculus, these discussions are marginal 
to everyday mathematical life and education, and are 
treated as such. Set theory is pervasive in the language 
used to describe the basic objects. Real-valued func- 
tions of one real variable are defined as sets of ordered 
pairs of real numbers, for example; a set-theoretic defi- 
nition of an ordered pair was given by wiener [VI.85] in 
1914, and the set-theoretic definition of functions may 
be dated from that time. However, research in analysis 
has been largely distinet from, and generally avoids, the 
foundational issues that may remain in connection with 
this vocabulary. This is not at all to say that contempo- 
rary mathematicians treat analysis in a purely formal 
way. The intuitive content associated with numbers and 
functions is very mueh a part of the way of thinking of 
most mathematicians. The axioms for the reals and for 
set theory form a framework to be referred to when 
necessary. But the essential objects of basic analysis, 
namely derivatives, integrals, series, and their existence 
or convergence behaviors, are dealt with along the lines 
of the early twentieth century, so that the ontologi- 
cal debates about the infmitesimal and infinite are no 
longer very lively. 
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A coda to this story is provided by the researches of 
robinson [VI.95] (1918-74) into “nonstandard” analy- 
sis, published in 1 96 1 . Robinson was an expert in model 
theory: the study of the relationship between systems 
of logical axioms and the structures that may satisfy 
them. His differentials were obtained by adjoining to 
the regular real numbers a set of “differentials,” which 
satisfied the axioms of an ordered held (in which there 
is ordinary arithmetic like that of the real numbers) but 
in addition had elements that were smaller than l/n 
for every positive integer n. In the eyes of some, this 
creation eliminated many of the unpleasant features of 
the usual way of dealing with the reals, and realized the 
ultimate goal of Leibniz to have a theory of inhnitesi- 
mals which was part of the same structure as that of 
the reals. Despite stimulating a flurry of activity, and 
considerable acclaim from some quarters, Robinson’s 
approach has never been widely accepted as a working 
foundation for analysis. 
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II.6 The Development of the 
Idea of Proof 

Leo Corry 

1 Introduction and 
Preliminary Considerations 

In many respects the development of the idea of proof 
is coextensive with the development of mathematics as 
a whole. Looking back into the past, one might at first 
consider mathematics to be a body of scientific know- 
ledge that deals with the properties of numbers, mag- 
nitudes, and figures, obtaining its justifications from 
proofs rather than, say, from experiments or induc- 
tive inferences. Such a characterization, however, is 
not without problems. For one thing, it immediately 
leaves out important chapters in the history of civiliza- 
tion that are more naturally associated with mathemat- 
ics than with any other intellectual activity. For exam- 
ple, the Mesopotamian and Egyptian cultures devel- 
oped elaborate bodies of knowledge that would most 
naturally be described as belonging to arithmetic or 
geometry, even though nothing is found in them that 
comes close to the idea of proof as it was later prac- 
ticed in mathematics at large. To the extent that any 
justification is given, say, in the thousands of math- 
ematical procedures found on clay tablets written in 
cuneiform script, it is inductive or based on experi- 
ence. The tablets repetitively show— without additional 
explanation or attempts at general justifications — a 
given procedure to be followed whenever one is pursu- 
ing a certain type of result. Later on, in the context of 
Chinese, Japanese, Mayan, or Hindu cultures, one again 
Ands important developments in helds naturally asso- 
ciated with mathematics. The extent to which these cul- 
tures pursued the idea of mathematical proof— a ques- 
tion that is debated among historians to this day— 
was undoubtedly not as great as it was in Greek tra- 
dition, and it certainly did not take the specihc forms 
we typically associate with the latter. Should one nev- 
ertheless say that these are instances of mathematical 
knowledge, even though they are not justihed on the 
basis of some kind of general, deductive proof? If so, 
then we cannot characterize mathematics as a body of 
knowledge that is backed up by proofs, as suggested 
above. However, this litmus test certainly provides a 
useful criterion— one that we do not want to give up 
too easily— for distinguishing mathematics from other 
intellectual endeavors. 
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Without totally ignoring these important questions, 
the present account focuses on a story that started, 
at some point in the past, usually taken to be before 
or around the fifth century b.c.e. in Greece, with the 
realization that there was a distinctive body of claims, 
mainly associated with numbers and with diagrams, 
whose truth could be and needed to be vindicated in 
a very special way— namely, by means of a general, 
deductive argument, or “proof.” Exactly when and how 
this story began is unclear. Equally unclear are the 
direct historical sources of such a unique idea. Since the 
emphasis on the use of logic and reason in constructing 
an argument was well-entrenched in other spheres of 
public life in ancient Greece— such as politics, rhetoric, 
and law — much earlier than the fifth century b.c.e., it is 
possible that it is in those domains that the origins of 
mathematical proof are to be found. 

The early stages of this story raise some addi- 
tional questions, both historical and methodological. 
For instance, Thales of Miletus, the first mathemati- 
cian known by name (though he was also a philoso- 
pher and scientist), is reported to have proved sev- 
eral geometric theorems, such as, for instance, that the 
opposite angles between two intersecting straight lines 
are equal, or that if two vertices of a triangle are the 
endpoints of the diameter of a circle and the third is 
any other point on the circle then the triangle must be 
right angled. Even if we were to accept such reports at 
face value, several questions would immediately arise: 
in what sense can it be asserted that Thales “proved” 
these results? More specifically, what were Thaies’s ini- 
tial assumptions and what inference methods did he 
take to be valid? We know very little about this. How- 
ever, we do know that, as a result of a complex histor- 
ical process, a certain corpus of knowledge eventually 
developed that comprised known results, techniques 
employed, and problems (both solved and yet requir- 
ing solution). This corpus gradually also incorporated 
the regulatory idea of proof: that is, the idea that some 
kind of general argument, rather than an example (or 
even many examples), was the necessary justification to 
be sought in all cases. As part of this development, the 
idea of proof came to be associated with strictly deduc- 
tive arguments, as opposed to, say, dialogic (meaning 
“negotiated”) or “probabilistically inferred” truth. It is 
an interesting and difficult historical question to estab- 
lish why this was the case, and one that we will not 
address here. 

euclid’s [VI.2] Elements was compiled some time 
around the year 300 b.c.e. It stands out as the most suc- 


cessful and comprehensive attempt of its kind to orga- 
nize the basic concepts, results, proofs, and techniques 
required by anyone wanting to master this increasingly 
complex body of knowledge. Still, it is important to 
stress that it was not the only such attempt within the 
Hellenic world. This endeavor was not just a matter 
of compilation, codification, and canonization, such as 
one can find in any other evolving held of learning at 
any point in time. Instead, the assertions it contained 
were of two different kinds, and the distinction was 
vitally important. On the one hånd there were basic 
assumptions, or axioms, and on the other there were 
theorems, which were typically more elaborate state- 
ments, together with accounts of how they followed 
from the axioms— that is, proofs. The way that proof 
was conceived and realized in the Elements became the 
paradigm for centuries to come. 

This article outlines the evolution of the idea of 
deductive proof as initially shaped in the framework 
of Euclidean-style mathematics and as subsequently 
practiced in the mainstream mathematical culture of 
ancient Greece, the Islamic world, Renaissance Europe, 
early modern European science, and then in the nine- 
teenth century and at the turn of the twentieth. The 
main focus will be on geometry: other helds, like arith- 
metic and algebra, will be treated mainly in relation 
to it. This choice is amply justihed by the subject 
matter itself. Indeed, much as mathematics stands 
out among the Sciences for the unique way in which 
it relies on proof, so Euclidean-style geometry stood 
out— at least until well into the seventeenth century— 
among closely related disciplines such as arithmetic, 
algebra, and trigonometry. Individual results in these 
other disciplines, or indeed the domains as a whole, 
were often regarded as fully legitimate only when they 
had been provided with a geometric (or geometric-like) 
foundation. Important developments in nineteenth- 
century mathematics, mainly in connection with the 
rise of non-euclidean geometries [II.2 §§6-10] and 
with problems in the foundations of analysis [II.5], 
eventually led to a fundamental change of orientation, 
where arithmetic (and eventually set theory [IV.l]) 
became the bastion of certainty and clarity from which 
other mathematical disciplines, geometry included, 
drew their legitimacy and their clarity. (See the cri- 
SIS IN THE FOUNDATIONS OF MATHEMATICS [II. 7] for a 
detailed account of this development.) And yet, even 
before this fundamental change, Euclidean-style proof 
was not the only way in which mathematical proof was 
conceived, explored, and practiced. By focusing mainly 
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on geometry, the present account will necessarily leave 
out important developments that eventually became 
the mainstream of legitimate mathematical knowledge. 
To mention just one important example in this regard, 
a fundamental question that will not be pursued here is 
how the principle of mathematical induction originated 
and developed, became accepted as a legitimate infer- 
ence rule of universal validity, and was finally codified 
as one of the basic axioms of arithmetic in the late nine- 
teenth century. Moreover, the evolution of the notion 
of proof involves many other dimensions that will not 
be treated here, such as the development of the inter- 
nal organization of mathematics into subdisciplines, 
as well as the changing interrelations between math- 
ematics and its neighboring disciplines. At a different 
level, it is related to how mathematics itself evolved as 
a socially institutionalized enterprise: we shall not dis- 
cuss interesting questions about how proofs are pro- 
duced, made public, disseminated, criticized, and often 
rewritten and improved. 

2 Greek Mathematics 

Euchd's Elements is the paradigmatic work of Greek 
mathematics, partly for what it has to say about the 
basic concepts, tools, results, and problems of syn- 
thetic geometry and arithmetic, but also for how it 
regards the role of a mathematical proof and the form 
that such a proof takes. All proofs appearing in the Ele- 
ments have six parts and are accompanied by a dia- 
gram. I illustrate this with the example of proposi- 
tion 1.37. Euclid’s text is quoted here in the classical 
translation of Sir Thomas Heath, and the meaning of 
some terms differs from current usage. Thus, two tri- 
angles are said to be “in the same parallels” if they have 
the same height and both their bases are contained in 
a single line, and any two figures are said to be “equal” 
if their areas are equal. For the sake of explanation, 
names of the parts of the proof have been added: these 
do not appear in the original. The proof is illustrated 
in figure 1. 

Protasis (enunciation). Triangles which are on the 
same base and in the same parallels are equal to one 
another. 

Ekthesis (setting out). Let ABC, DBC be triangles on 
the same base BC and in the same parallels AD, BC. 
Diorismos (definition of goal). I say that the triangle 
ABC is equal to the triangle DBC. 



Figure 1 Proposition 1.37 of Euclid’s Elements. 


Kataskeue (construction). Let AD be produced in both 
directions to E, F; through B let BE be drawn parallel 
to CA, and through C let CF he drawn parallel to BD. 
Apodeixis (proof). Then each of the figure s EBCA, 
DBFC is a parallelogram; and they are equal, for they 
are on the same base BC and in the same parallels 
BC, EF. Moreover the triangle ABC is half of the par- 
allelogram EBCA, for the diameter AB bisects it; and 
the triangle DBC is half of the parallelogram DBCF, 
for the diameter DC bisects it. Therefore the triangle 
ABC is equal to the triangle DBC. 

Sumperasma (conclusion). Therefore triangles which 
are on the same hase and in the same parallels are 
equal to one another. 

This is an example of a proposition that States a prop- 
erty of geometric figures. The Elements also includes 
propositions that express a task to be carried out. An 
example is proposition 1.1: “On a given finite straight 
line to construct an equilateral triangle.” The same six 
parts of the proof and the diagram invariably appear 
in propositions of this kind as well. This formal struc- 
ture is also followed in all propositions appearing in 
the three arithmetic hooks of the Elements and, most 
importantly, all of them are always accompanied by a 
diagram. Thus, for instance, consider proposition IX.3 5, 
which in its original version reads as follows: 

If as many numbers as we piease be in continued pro- 
portion, and there be subtracted from the second and 
the last numbers equal to the first, then, as the excess 
of the second is to the first, so will the excess of the 
last be to all those before it. 

This cumbersome formulation may prove incompre- 
hensible on first reading. In more modern terms, an 
equivalent to this theorem would State that, given a 
geometric progression a.\, a.2 , ■ . . , a n+ 1, we have 
(a n + 1 - ai) : (ai + «2 + ■ ■ ■ + «n) = (a 2 - a 1) : a 1. 
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Figure 2 Proposition IX. 3 5 of Euclid’s Elements. 

This translation, however, fails to convey the spirit of 
the original, in which no formal symbolic manipulation 
is, or can be, made. More importantly, a modern alge- 
braic proof fails to convey the ubiquity of diagrams in 
Greek mathematical proofs, even where they are not 
needed for a truly geometric construction. Indeed, the 
accompanying diagram for proposition IX. 3 5 is shown 
as figure 2 and the first few lines of the proof are as 
follows: 

Let there be as many numbers as we piease in contin- 
ued proportion A, BC, D, EF, beginning from A as least 
and let there be subtracted from BC and EF the num- 
bers BG, FH, each equal to A; I say that, as GC is to A, 
so is EH to A, BC, D. For let FK be made equal to BC and 
FL equal to D 

This proposition and its proof provide good exam- 
ples of the capabilities, as well as the limitations, of 
ancient Greek practices of notation, and especially of 
how they managed without a truly symbolic language. 
In particular, they demonstrate that proofs were never 
conceived by the Greeks, even ideally, as purely logical 
constructs, but rather as specific kinds of arguments 
that one applied to a diagram. The diagram was not 
just a visual aid to the argumentation. Rather, through 
the ekthesis part of the proof, it embodied the idea 
referred to by the general character and formulation 
of the proposition. 

Together with the centrality of diagrams, the six- 
part structure is also typical of most of Greek math- 
ematics. The constructions and diagrams that typi- 
cally appeared in Greek mathematical proofs were not 
of an arbitrary kind, but what we identify today as 
straightedge-and-compass constructions. The reason- 
ing in the apodeixis part could be either a direct deduc- 
tion or an argument by contradiction, but the result was 
always known in advance and the proof was a means 
to justify it. In addition, Greek geometric thinking, 
and in particular Euclid-style geometric proofs, strictly 
adhered to a principle of homogeneity. That is, magni- 
tudes were only compared with, added to, or subtracted 



Figure 3 Proposition XII.2 of Euclid’s Elements. 

frommagnitudes of like kind— numbers, lengths, areas, 
or volumes. (See numbers [II. 1 §2] for more about this.) 

Of particular interest are those Greek proofs con- 
cerned with lengths of curves, as well as with areas or 
volumes enclosed by curvilinear shapes. Greek mathe- 
maticians lacked a flexible notation capable of express- 
ing the gradual approximation of curves by polygons 
and an eventual passage to the infmite. Instead, they 
devised a special kind of proof that involved what can 
retrospectively be seen as an implicit passage to the 
limit, but which did so in the framework of a purely geo- 
metric proof and thus unmistakably followed the six- 
part proof-scheme described above. This implicit pas- 
sage to the infmite was based on the application of a 
continuity principle, later associated with archimedes 
[VI. 3]. In Euclid’s formulation, for instance, the princi- 
ple States that, given two unequal magnitudes of the 
same kind, A, B (be they two lengths, two areas, or two 
volumes), with A greater than B, and if we subtract from 
A a magnitude which is greater than A/2, and from 
the remainder we subtract a magnitude that is greater 
than its half, and if this process is iterated a sufficient 
number of times, then we will eventually remain with a 
magnitude that is smaller than B. Euclid used this prin- 
ciple to prove, for instance, that the ratio of the areas 
of two circles equals the ratio of the squares of their 
diameters (XII.2). The method used, later known as the 
exhaustion method, was based on a double contradic- 
tion that became standard for many centuries to come. 
This double contradiction is illustrated in figure 3, the 
accompanying diagram to the proposition. 

If the ratio of the square on BD to the square on FH 
is not the same as the ratio of circle ABCD to circle 
EFGH, then it must be the same as the ratio of circle 
ABCD to an area S either larger or smaller than cir- 
cle EFGH. The curvilinear figures are approximated by 
polygons, since the continuity principle allows the dif- 
ference between the inscribed polygon and the circle 
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to be as close as desired (e.g., doser than the differ- 
ence between S and EFGH). The “double contradiction” 
is reached if one assumes that 5 is either smaller or 
larger than EFGH. 

Forms of proof and constructions other than those 
mentioned so far are occasionally found in Greek math- 
ematical texts. These include diagrams based on what 
is assumed to be the synchronized motion of two lines 
(e.g., the trisectrix, or Archimedes’ spiral), mechanical 
devices of many sorts, or reasoning based on ideal- 
ized mechanical considerations. However, the Euclid- 
ean type of proof described above remained a model 
to be followed wherever possible. There is a famous 
Archimedes palimpsest that provides evidence of how 
less canonical methods, drawing on mechanical consid- 
erations (albeit of a highly ideahzed kind), were used to 
deduce results about areas and volumes. However, even 
this bears testimony to the primacy of the ideal model: 
there is a letter from Archimedes to Eratosthenes in 
which he displays the ingenuity of his mechanical meth- 
ods but at the same time is at pakis to stress thek 
heuristic character. 

3 Islaimc and Renaissance Mathematics 

Just as Euclid is considered to be representative of 
a mainstream tradition in Greek mathematics, al- 
khwArizmi [VI. 5] is regarded as a typical representa- 
tive of Islande mathematics. There are two main traits 
of his work that are relevant to the present account 
and that became increasingly central to the develop- 
ment of mathematics, starting with his works in the 
late eighth century and continuing until the works of 
cardano [VI. 7] in sixteenth-century Italy. These traits 
are a pervasive “algebraization” of mathematical think- 
ing, and a continued rehance on Euclidean-style geo- 
metric proof as the main way of legitimizing the validity 
of mathematical knowledge in general and of algebraic 
reasoning in mathematics in particular. 

The prime example of this combination is found in 
al-Khwårizmi's seminal text al-Kitåb al-mukhtasar ft 
hisåb al-jabr waTmuqåbala (“The compendious book 
on calculation by completion and balancing”), where 
he discusses the solutions of problems in which the 
unknown length appears in combination with numbers 
and squares (the side of which is an unknown). Since he 
only envisages the possibility of positive “coefficients” 
and positive rational solutions, al-Khwarizmi needs to 
consider six different situations each of which requkes 



Figure 4 Al-Khwårizml’s geometric justification 
of the formula for a quadratic equation. 

a different recipe for finding the unknown: the full- 
grown idea of a general quadratic equation and an algo- 
rithm to solve it in all cases does not appear in Islande 
mathematical texts. For instance, the problem “squares 
and roots equal to numbers” (e.g., x 2 + lOx = 39, in 
modern notation) and the problem “roots and numbers 
equal to squares” (e.g., 3x + 4 = x 2 ) are considered 
to be completely different ones, as are thek solutions, 
and accordingly al-Khwarizmi treats them separately. 
In all cases, however, al-Khwårizmi proves the validity 
of the method described by translating it into geomet- 
ric terms and then relying on Euclid-like geometric the- 
orems budt around a specific diagram. It is noteworthy, 
however, that the problems refer to specific numeri- 
cal quantities associated with the magnitudes involved, 
and these measured magnitudes refer to the accompa- 
nying diagrams as well. In this way, al-Khwårizmi inter- 
estingly departs from the Euclidean style of proof. Still, 
the Greek principle of homogeneity is essentially pre- 
served, as the three quantities usually involved in the 
problem are ad of the same kind, namely, areas. 

Consider, for instance, the equation x 2 + lOx = 
39, which corresponds to the fodowing problem of 
al-Khwårizmi. 

What is the square which combined with ten of its roots 
wdl give a sum total of 39? 

The recipe prescribes the fodowing steps. 

Take one-half of the roots [5] and multiply them by 
itself [25]. Add this amount to 39 and obtain 64. Take 
the square root of this, which is eight, subtract from it 
half the roots, leaving three. The number three there- 
fore represents one root of this square, which itself, of 
course, is nine. 

The justification is provided by figure 4. 
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Here ab represents the said square, which for us is 
x 2 , and the rectangles c, d, e, f represent an area of 
™x each, so that all of them together equal lOx, as 
in the problem. Thus, the small squares in the cor- 
ners represent an area of 6.25 each, and we can “com- 
plete” the large square, being equal to 64, and whose 
side is therefore 8, thus yielding the solution 3 for the 
unknown. 

Abu Kamil Shuja, just one generation after al- 
Khwarizmi, added force to this approach when he 
solved additional problems while specifically relying 
on theorems taken from the Elements, including the 
accompanying diagrams, in order to justify his method 
of solution. The primacy of the Euclidean-type proof, 
which was already accepted in geometry and arith- 
metic, thus also became associated with the algebraic 
methods that eventually turned into the main topic of 
interest in Renaissance mathematics. Cardano's 1545 
Ars Magna, the foremost example of this new trend, 
presented a complete treatment of the equations of 
third and fourth degree. Although the algebraic line 
of reasoning that he adopted and developed became 
increasingly abstract and formal, Cardano continued 
to justify his arguments and methods of solution by 
reference to Euclid-like geometric arguments based on 
diagrams. 

4 Seventeenth-Century Mathematics 

The next significant change in the conception of proof 
appears in the seventeenth century. The most influen- 
tial development of mathematics in this period was the 
creation of the infinitesimal calculus simultaneously 
by newton [VI. 14] and leibniz [VI.15]. This momen- 
tous development was the culmination of a process 
that spanned most of the century, involving the intro- 
duction and gradual improvement of important tech- 
niques for determining areas and volumes, gradients 
of tangents, and maxima and minima. These develop- 
ments included the elaboration of traditional points of 
view that went back to the Greek classics, as well as the 
introduction of completely new ideas such as the “indi- 
visibles,” whose status as a legitimate tool for math- 
ematical proof was hotly debated. At the same time, 
the algebraic techniques and approaches that Renais- 
sance mathematicians continued to expand upon, fol- 
lowing on from their Islande predecessors, now gained 
additional impetus and were gradually incorporated— 
starting with the work of fermat [VI.12] and des- 
cartes [VI. 11] — into the arsenal of tools available for 



Figure 5 Diagram for Fermat’s proof 
of the area under a hyperbola. 

proving geometric results. Underlying these various 
trends were different conceptions and practices of 
mathematical proof, which are briefly described and 
illustrated now. 

Examples of how the classical Greek conception of 
geometric proof was essentially followed but at the 
same time fruitfully modified and expanded are found 
in the work of Fermat, as can be seen in his calcula- 
tion of the area enclosed by a generalized hyperbola 
(in modern notation (y/a) m = ( x/b) n (m,n * 1)) and 
its asymptotes. 

The quadratic hyperbola (i.e., a figure represented by 
y = 1/x 2 ), for instance, is defined here in terms of a 
purely geometric relationship on any two of its points, 
namely, that the ratio between the squares built on the 
abscissas equals the inverse ratio between the lengths 
of the ordinates. In its original version it is expressed as 
follows: AG 2 : AH 2 :: IH : EG (see figure 5). It should be 
noticed that this is not an equation in the present sense 
of the word, on which the standard symbolic manipula- 
tions can be direetly performed. Rather, this is a four- 
term proportion to which the rules of Greek classical 
mathematics apply. Also, the proof was entirely geo- 
metric and indeed it essentially followed the Euclidean 
style. Thus, if the segments AG, AH, AO, etc., are cho- 
sen in continued proportion, then one can prove that 
the rectangles EH, 10, NM, etc., are also in continued 
proportion, and indeed that EH : 10 :: 10 : NM 
AH: AG. 

Fermat made use of proposition IX. 3 5 of the Elements 
(mentioned above), which comprises an expression for 
the sum of any number of quantities in a geometric 
progression, namely (in more modern notation): 

(a n + 1 - ai) : (ai + «2 + ■ ■ ■ + «n) = («2 - ai) : a i. 


77.6. The Development of the Idea ofProof 


135 


But at this point his proof takes an interesting turn. 
He introduces the somewhat obscure concept of “ade- 
quare,” which he found in the works of Diophantus, 
and which allows a kind of “approximate equality.” 
Specifically, this idea allows him to bypass the cumber- 
some procedure of double contradiction typically used 
in Greek geometry as an implicit passage to the infi- 
nite. A figure bounded by GE, by the horizontal asymp- 
tote, and by the hyperbola will equal the infinite sum 
of rectangles obtained when the rectangle EH “will van- 
ish and will be reduced to nothing.” Further, proposi- 
tion IX.35 implies that this sum equals the area of the 
rectangle BG. Significantly, Fermat still chose to rely on 
the authority of the ancients, hinting at the method of 
double contradiction when he declared that this result 
“would be easy to confirm by a more lengthy proof 
carried out in the manner of Archimedes.” 

Attempts to expand the accepted canon of geo- 
metric proof eventually led to the more progressive 
approaches associated with the idea of indivisibles 
(described below), as practiced by Cavalieri, Roberval, 
and Torricelli. This is well-illustrated by Torricelli’s 
1643 calculation of the volume of the infinite body 
created by (expressed in modern terms) rotating the 
hyperbola xy = k 2 around the y-axis, with values of x 
between 0 and a. 

The essential idea of indivisibles is that areas are con- 
sidered to be sums, or collections, of infinitely many 
line segments, and volumes are considered to be sums, 
or collections, of infimtely many areas. In this exam- 
ple, Torricelli calculated the volume of revolution by 
considering it to be a sum of the curved surfaces of 
an infinite collectionof cylinders successivelyinscribed 
within each other and having radii ranging from 0 to a. 
The area of the curved surface of the inscribed cylin- 
der with radius x is 2Trx(k 2 /x) and is thus equal, for 
any x, to the area of the circle AS, where S is the point 
(k, k) on the hyperbola in figure 6(b). 

However, from the figure it can be seen that in budd- 
ing the entire rotational body there is a cylindrical sur- 
face associated with each possible length between 0 
and a, and therefore that the total volume of the infi- 
nite body can be considered as being composed of all 
the cylindrical surfaces, which in turn equals the infi- 
nite sum of circles, each of which is associated with a 
radius between 0 and a (see figure 6(c)), and which is 
equal to the volume of a cylinder with radius AS and 
height a (see figure 6(d)). 

The rules of Euclid-like geometric proof were com- 
pletely contravened in proofs of this kind and this 


made them unacceptable in the eyes of many. On the 
other hånd, their fruitfulness was highly appealing, 
especiady in cases like this one in which an infinite body 
was shown to have a finite volume, a result which Torri- 
celli himself found extremely surprising. Both support- 
ers and detractors alike, however, realized that tech- 
niques of this kind might lead to contradictions and 
inaccurate results. By the eighteenth century, with the 
accelerated development of the infinitesimal calculus 
and its associated techniques and concepts, techniques 
based on indivisibles had essentially disappeared. 

The limits set by the classical paradigm of Euclid- 
ean geometric proof were then transgressed in a dif- 
ferent direction by the all-embracing algebraization of 
geometry at the hånds of Descartes. The fundamen- 
tal step undertaken by Descartes was to introduce unit 
lengths as a key element in the diagrams used in geo- 
metric proofs. The radical innovation implied by this 
step, allowing the hitherto nonexistent possibility of 
defining operations with line segments, was explicitly 
stressed by Descartes in La Géométrie in 1637: 

Just as arithmetic consists of only four or five oper- 
ations, namely addition, subtraction, multiplication, 
division, and the extraction of roots, which may be 
considered a kind of division, so in geometry, to find 
required lines it is merely necessary to add or subtract 
other lines; or else, taking one line, which I shall call the 
unit in order to relate it as closely as possible to num- 
bers, and which can in general be chosen arbitrarily, 
and having given two other lines, to find a fourth line 
which shall be to one of the given lines as the other is to 
the unit (which is the same as multiplication); or again, 
to find a fourth line which is to one of the given lines as 
the unit is to the other (which is equivalent to division); 
or, finally, to find one, two, or several mean proportion- 
als between the unit and some other line (which is the 
same as extracting the square root, cube root, etc., of 
the given line). 

Thus, for instance, given two segments BD, BE, the 
division of their lengths is represented by BC in figure 7, 
in which AB represents the unit length. 

Although the proof was Euclid-like in appearance 
(because of the diagram and the use of the theory of 
similar triangles), the introduction of the unit length 
and its use for defining the operations with segments 
set it radically apart and opened completely new hori- 
zons for geometric proofs. Not only had measurements 
of length been absent from Euclidean-style proofs thus 
far, but also, as a consequence of the very existence 
of these operations, the essential dimensionality tra- 
ditionally associated with geometric theorems lost its 
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significance. Descartes used expressions such as a - 
b, a/b, a 2 , b 3 , and their roots, but he stressed that 
they should all be understood as “only simple lines, 
which, however, I name squares, cubes, etc., so that I 
make use of the terms employed in algebra.” With the 
removal of dimensionality, the requirement of homo- 
geneity also became unnecessary. Unlike his predeces- 
sors, who håndled magnitudes only when they had a 
direct geometric significance, Descartes could not see 
any problem in forming an expression such as a 2 b 2 -b 
and then extracting its cube root. In order to do so, he 
said “we must consider the quantity a 2 b 2 divided once 
by the unit, and the quantity b multiplied twice by the 
unit.” Sentences of this kind would be simply incompre- 
hensible to Greek geometers, as well as to their Islamic 
and Renaissance foliowers. 


This algebraization of geometry, and particularly the 
newly created possibility of proving geometric facts via 
algebraic procedures, was strongly related to the recent 
consolidation of the idea of an algebraic equation, seen 
as an autonomous mathematical entity, for which for- 
mal rules of manipulation were well-known and could 
be systematically applied. This idea reached full matu- 
rity in the hånds of viéte [VI.9] only around 1591. But 
not all mathematicians in the seventeenth century saw 
the important developments associated with algebraic 
thinking either as a direction to be naturally adopted 
or as a clear sign of progress in the latter discipline. 
A prominent opponent of any attempt to deviate from 
the classical Euclidean-style approach in geometry was 
none other than newton [VI. 14], who, in the Arith- 
metica Universalis (1707), was emphatic in expressing 
his views: 
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Figure 7 Descartes’s geometric calculation 
of the division of two given segments. 


Equations are expressions of arithmetic computation 
and properly have no place in geometry, except in 
so far as truly geometrical quantities (lines, surfaces, 
solids and proportions) are thereby shown equal, some 
to others. Multiplications, divisions, and computations 
of that kind have recently been introduced into geom- 
etry, unadvisedly and against the first principle of this 
science. ...Therefore these two Sciences ought not to 
be conf ounded, and recent generations by conf ounding 
them have lost that simplicity in which all geometrical 
elegance consists. 

Newton's Principia bears witness to the faet that 
statements like this one were far from mere lip ser- 
vice, as Newton consistently preferred Euclidean-style 
proofs, considering them to be the correct language for 
presenting his new physics and for bestowing it with 
the highest degree of certainty. He used his own cal- 
culus only where strietly necessary, and barred algebra 
from his treatise entirely. 

5 Geometry and Proof in 
Eighteenth-Century Mathematics 

Mathematical analysis became the primary focus of 
mathematicians in the eighteenth century. Questions 
relating to the foundations of analysis arose immedi- 
ately after the calculus began to be developed and were 
not settled until the late nineteenth century. To a con- 
siderable extern these questions were about the nature 
of legitimate mathematical proof, and debates about 
them played an important role in undermining the long- 
undisputed status of geometry as the basis for math- 
ematical certainty and bestowing this status on arith- 
metic instead. The first important stage in this process 
was euler's [VI. 19] reformulation of the calculus. Once 
separated from its purely geometric roots, the calculus 
came to be centered on the algebraically oriented con- 
cept of funetion. This trend for favoring algebra over 


geometry was given further impetus by Euler's suc- 
cessors. d’alembert [VI. 20], for instance, associated 
mathematical certainty above all with algebra— because 
of its higher degree of generality and abstraction — and 
only subsequently with geometry and mechanics. This 
was a clear departure from the typical views of Newton 
and of his contemporaries. The trend reached a peak 
and was transformed into a well-conceived program 
in the hånds of lagrange [VI.22], who in the preface 
to his 1788 Mécanique Analytique famously expressed 
a radical view about how one could achieve certainty 
in the mathematical Sciences while distancing oneself 
from geometry. He wrote as follows: 

One wiU not find figures in this work. The methods 
that I expound require neither constructions, nor geo- 
metrical or mechanical arguments, but only algebraic 
operations, subject to a regular and uniform course. 

The details of these developments are beyond the scope 
of this article. What is important to stress, however, is 
that in spite of their very considerable impact, the basic 
conceptions of proof in the more mainstream realm of 
geometry did not change very mueh during the eigh- 
teenth century. An illuminating perspective on these 
conceptions is offered by the views of contemporary 
philosophers, especially Immanuel Kant. 

Kant had a very profound knowledge of contem- 
porary science, and particularly of mathematics. A 
philosophical discussion of his views on mathematical 
knowledge and proof need not concern us here. How- 
ever, given his acquaintance with contemporary con- 
ceptions, they do provide an insightful historical per- 
spective on proof as it was understood at the time. Of 
particular interest is the contrast he draws between a 
philosophical argument, on the one hånd, and a geo- 
metric proof, on the other. Whereas the former deals 
with general concepts, the latter deals with concrete, 
yet nonempirical, concepts, by reference to “visualiz- 
able intuitions” ( Anschauung ). This difference is epit- 
omized in the following, famous passage from his 
Critique o f Pure Reason. 

Suppose a philosopher be given the concept of a tri- 
angle and he is left to find out, in his own way, what 
relation the sum of its angles bears to a right angle. 
He has nothing but the concept of a figure enclosed by 
three straight lines, and possessing three angles. How- 
ever long he meditates on this concept, he will never 
produce anything new. He can analyze and clarify the 
concept of a straight line or of an angle or of the num- 
ber three, but he can never arrive at any properties not 
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already contained in these concepts. Now let the geo- 
metridan take up these questions. He at once begins by 
constructing a triangle. Since he knows that the sum of 
two right angles is exactly equal to the sum of all the 
adj acent angles which canbe constructed from a single 
point on a straight line, he prolongs one side of his tri- 
angle and obtains two adj acent angles, which together 
are equal to two right angles. He then dlvides the exter- 
nal angle by drawing a line parallel to the opposite side 
of the triangle, and observes that he has thus obtained 
an external adjacent angle which is equal to an internal 
angle— and so on. In this fashion, through a chain of 
inferences guided throughout by intuition, he arrives 
at a fully evident and universally valid solution of the 
problem. 

In a nutshell, then, for Kant the nature of mathemat- 
ical proof that sets it apart from other kinds of deduc- 
tive argumentation (like philosophy) lies in the central- 
ity of the diagrams and the role that they play. As in the 
Elements, this diagram is not just a heuristic guide for 
what is no more than abstract reasoning, but rather an 
“intuition,” a singular embodiment of the mathematical 
idea that is clearly located not only in space, but rather 
in space and time. In faet, 

I cannot represent to myself a line, however small, with- 
out drawing it in thought, that is gradually generat- 
ing all its parts from a point. Only in this way can the 
intuition be obtained. 

This role played by diagrams as “visualizable intu- 
itions” is what provides, for Kant, the explanation of 
why geometry is not just an empirical science, but also 
not just a huge tautology devoid of any synthetic con- 
tent. According to him, geometric proof is constrained 
by logic but it is mueh more than just a purely logi- 
cal analysis of the terms involved. This view was at the 
heart of a novel philosophical analysis whose starting 
point was the then-entrenched conception of what a 
mathematical proof is. 

6 Nineteenth-Century Mathematics and 
the Formal Conception of Proof 

The nineteenth century was full of important develop- 
ments in geometry and other parts of mathematics, not 
just of the methods but also of the aims of the vari- 
ous subdisciplines. Logic, as a held of knowledge, also 
underwent significant changes and a gradual mathema- 
tization that entirely transformed its scope and meth- 
ods. Consequently, by the end of the century the con- 
ception of proof and its role in mathematics had also 
been deeply transformed. 


In Gottingen in 1854 riemann [VI.49] gave his sem- 
inal talk “On the hypotheses which lie at the foun- 
dations of geometry.” At around the same time, the 
works of bolyai [VI.34] and lobachevskii [VI.31] on 
non-Euclidean geometry, as well as the related ideas of 
gauss [VT.26], all dating from the 1830s, began to be 
more generally known. The existence of coherent, alter- 
native geometries brought about a pressing need for 
the most basic, longstanding beliefs about the essence 
of geometric knowledge, including the role of proof 
and mathematical rigor, to be revised. Of even greater 
significance in this regard was the renewed interest in 
projective geometry [1.3 §6.7], which became a very 
active held of research with its own open research 
questions and foundational issues after the publica- 
tion of Jean Poncelet’s 1822 treatise. The addition of 
projective geometry to the many other possible geo- 
metric perspectives prompted a variety of attempts at 
unification and classification, the most significant of 
which were those based on group-theoretic ideas. Par- 
ticularly notable were those of klein [VI. 5 7] and lie 
[VI.53] in the 1870s. In 1882, Moritz Pasch published 
an influential treatise on projective geometry devoted 
to a systematic exploration of its axiomatic foundations 
and the interrelationships among its fundamental the- 
orems. Pasch’s book also attempted to close the many 
logical gaps that had been found in Euclidean geometry 
over the years. More systematically than any of his fel- 
low nineteenth-century mathematicians, Pasch empha- 
sized that all geometric results should be obtained 
from axioms by strict logical deduction, without rely- 
ing on analytical means, and above all without appeal to 
diagrams or to properties of the figures involved. Thus, 
although in some ways he was consciously reverting 
to the canons of Euclid-like proof (which by then were 
somewhat loosened), his attitude toward diagrams was 
fundamentally different. Aware of the potential limita- 
tions of visualizing diagrams (and perhaps their mis- 
leading influence) he put a mueh greater emphasis on 
the pure logical structure of the proof than his prede- 
cessors had. Nevertheless, he was not led to an out- 
right formalist view of geometry and geometric proof. 
Rather, he consistently adopted an empirical approach 
to the origins and meaning of geometry and feli short 
of claiming that diagrams were for heuristic use only: 

The basic propositions [of geometry] cannot be under- 
stood without corresponding drawings; they express 

what has been observed from certain, very simple facts. 

The theorems are not founded on observations, but 
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rather, they are proved. Every inference perf ormed dur- 
ing a deduction must find confirmation in a drawing, 

yet it is not justified by a drawing but from a certain 

preceding statement (or a definition). 

Pasch’s work definitely contributed to diagrams los- 
ing their central status in geometric proofs in favor of 
purely deductive relations, but it did not directly lead 
to a thorough revision of the status of the axioms of 
geometry, or to a change in the conception that geom- 
etry deals essentially with the study of our spatial, visu- 
alizable intuition (in the sense of Anschauung). The all- 
important nineteenth-century developments in geom- 
etry produced significant changes in the conception of 
proof only under the combined influence of additional 
factors. 

Mathematical analysis continued to be a primary field 
of research, and the study of its foundations became 
increasingly identified with arithmetic, rather than geo- 
metric, rigor. This shift was provoked by the works 
of mathematicians like cauchy [VI.29], weierstrass 
[VI.44], cantor [VI. 54], and dedekind [VI. 50], which 
aimed at eliminating intuitive arguments and concepts 
in favor of ever more elementary statements and defi- 
nitions. (In faet, it was not until the work of Dedekind 
on the foundations of arithmetic, in the last third of 
the century, that the rigorous formulation pursued in 
these works was given any kind of axiomatic underpin- 
ning.) The idea of investigating the axiomatic basis of 
mathematical theories, whether geometry, algebra, or 
arithmetic, and of exploring alternative possible sys- 
tems of postulates was indeed pursued during the nine- 
teenth century by mathematicians such as George Pea- 
cock, Charles Babbage, John Herschel, and, in a differ- 
ent geographical and mathematical context, Hermann 
Grassmann. But such investigations were the exception 
rather than the rule, and they had only a fairly limited 
role in shaping a new conception of proof in analysis 
and geometry. 

One major turning point, where the above trends 
combined to produce a new kind of approach to proof, 
is to be found in the works of Giuseppe peano [VI.62] 
and his Italian followers. Peano’s mainstream activities 
were as a competent analyst, but he was also interested 
in artificial languages, and particularly in developing an 
artificial language that would allow a completely formal 
treatment of mathematical proofs. In 1889 his success- 
ful application of such a conceptual language to arith- 
metic yielded his famous postulates for the nat- 
ural numbers [III.69]. Pasch’s systems of axioms for 


projective geometry posed a challenge to Peano’s arti- 
ficial language, and he set out to investigate the rela- 
tionship between the logical and the geometric terms 
involved in the deductive structure of geometry. In this 
context he introduced the idea of an independent set 
of axioms, and applied this concept to his own system 
of axioms for projective geometry, which were a slight 
modiheation of Pasch’s. This view did not lead Peano to 
a formalistic conception of proof, and he still conceived 
geometry in terms very similar to his predecessors: 

Anyone is allowed to take a hypothesis and develop 
its logical consequences. However, if one wants to 
give this work the name of geometry it is necessary 
that such hypotheses or postulates express the result 
of simple and elementary observations of physical 
figures. 

Under the influence of Peano, Mario Pieri developed 
a symbolism with which to handle abstract-formal the- 
ories. Unlike Peano and Pasch, Pieri consistently pro- 
moted the idea of geometry as a purely logical sys- 
tem, where theorems are deduced from hypothetical 
premises and where the basic terms are completely 
detached from any empirical or intuitive significance. 

A new chapter in the history of geometry and of 
proof was opened at the end of the nineteenth century 
with the publication of hilbert’s [VI.63] Grundlagen 
der Geometrie, a work that synthesized and brought to 
completion the various trends of geometric research 
described above. Hilbert was able to achieve a com- 
prehensive analysis of the logical interrelations among 
the fundamental results of projective geometry, such 
as the theorems of Desargues and Pappus, while pay- 
ing particular attention to the role of continuity con- 
siderations within their proofs. His analysis was based 
on the introduction of a generalized analytic geometry, 
in which the coordinates may be taken from a variety 
of different number fields [III.65], rather than from 
the real numbers alone. This approach created a purely 
synthetic arithmetization of any given type of geom- 
etry, and thus helped to clarify the logical structure 
of Euclidean geometry as a deductive system. It also 
clarified the relationship between Euclidean geometry 
and the various other kinds of known geometries — non- 
Euclidean, projective, or non-Archimedean. This focus 
on logic implied, among other things, that diagrams 
should be relegated to a merely heuristic role. In faet, 
although diagrams still appear in many proofs in the 
Grundlagen, the entire purpose of the logical analysis 
is to avoid being misled by diagrams. Proofs, and partic- 
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ularly geometric proofs, have thus become purely logi- 
cal arguments, rather than arguments about diagrams. 
And at the same time, the essence and the role of the 
axioms from which the derivations in question start 
also underwent a dramatic change. 

Following Pasch’s lead, Hilbert introduced a new sys- 
tem of axioms for geometry that attempted to close 
the logical gaps inherent in earlier systems. These 
axioms were of five kinds— axioms of incidence, of 
order, of congruence, of parallels, and of continuity— 
each of which expressed a particular way in which 
spatial intuition manifests itself in our understanding. 
They were formulated for three fundamental kinds of 
object: points, lines, and planes. These remained unde- 
fined, and the system of axioms was meant to provide 
an implicit definition of them. In other words, rather 
than defming points or lines at the outset and then pos- 
tulating axioms that are assumed to be valid for them, 
a point and a line were not directly defined, except as 
entities that satisfy the axioms postulated by the sys- 
tem. Further, Hilbert demanded that the axioms in a 
system of this kind should be mutually independent, 
and introduced a method for checking that this demand 
is fulfilled; in order to do so, he constructed models 
of geometries that fail to satisfy a given axiom of the 
system but satisfy all the others. Hilbert also required 
that the system be consistent, and that the consistency 
of geometry could be made to depend, in his system, 
on that of arithmetic. He initially assumed that prov- 
ing the consistency of arithmetic would not present a 
major obstacle and it was a long time before he realized 
that this was not the case. Two additional requirements 
that Hilbert initially introduced for axiomatic systems 
were simplicity and completeness. Simplicity meant, in 
essence, that an axiom should not contain more than 
“a single idea.” The demand that every axiom in a sys- 
tem be “simple,” however, was never clearly defined or 
systematically pursued in subsequent works of Hilbert 
or any of his successors. The last requirement, com- 
pleteness, meant for Hilbert in 1900 that any adequate 
axiomatization of a mathematical domain should allow 
for a derivation of all the known theorems of the disci- 
pline in question. Hilbert claimed that his axioms would 
indeed yield all the known results of Euclidean geom- 
etry, but of course this was not a property that he could 
formally prove. In faet, since this property of “com- 
pleteness” cannot be formally checked for any given 
axiomatic system, it did not become one of the stan- 
dard requirements of an axiomatic system. It is impor- 
tant to note that the concept of completeness used by 


Hilbert in 1900 is completely different from the cur- 
rently accepted, model-theoretical one that appeared 
mueh later. The latter amounts to the requirement that 
in a given axiomatic system every true statement, be it 
known or unknown, should be provable. 

The use of undefined concepts and the concomitant 
conception of axioms as implicit definitions gave enor- 
mous impetus to the view of geometry as a purely logi- 
cal system, such as Pieri had devised it, and eventually 
transformed the very idea of truth and proof in mathe- 
matics. Hilbert claimed on various occasions— echoing 
an idea of Dedekind— that, in his system, “points, lines, 
and planes” could be substituted by “chairs, tables, and 
beer mugs,” without thereby affeeting in any sense the 
logical structure of the theory. Moreover, in the light 
of discussions about set-theoretical paradoxes, Hilbert 
strongly emphasized the view that the logical consis- 
tency of a concept implicitly defined by axioms was the 
essence of mathematical existence. Under the influence 
of these views, of the new methodological tools intro- 
duced by Hilbert, and of the successful overview of the 
foundations of geometry thus achieved, many mathe- 
maticians went on to promote new views of mathemat- 
ics and new mathematical activities that in many senses 
went beyond the views embodied in Hilbert’s approach. 
On the one hånd, a trend that thrived in the United 
States at the beginning of the twentieth century, led by 
Eliakim H. Moore, turned the study of systems of postu- 
lates into a mathematical held in its own right, indepen- 
dent of direct interest in the held of research defined 
by the systems in question. For instance, these math- 
ematicians defined the minimal set of independent 
postulates for groups, helds, projective geometry, etc., 
without then proceeding to investigate of any of these 
individual disciplines. On the other hånd, prominent 
mathematicians started to adopt and develop inereas- 
ingly formalistic views of proof and of mathematical 
truth, and began applying them in a growing number 
of mathematical helds. The work of the radically mod- 
ernist mathematician Felix hausdorff [VI.68] provides 
important examples of this trend, as he was among 
the hrst to consistently associate Hilbert’s achievement 
with a new, formalistic view of geometry. In 1904, for 
instance, he wrote: 

In all philosophical debates since Kant, mathematics, 
or at least geometry, has always been treated as het- 
eronomous, as dependent on some external instance 
of what we could call, for want of a better term, intu- 
ition, be it pure or empirical, subjective or scientihcally 
amended, innate or acquired. The most important and 
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fundamental task of modern mathematlcs has been to 

set itself free from this dependency, to fight Its way 

through from heteronomy to autonomy. 

Hilbert himself would pursue such a point of view 
around 1918, whenhe engaged in the debates about the 
consistency of arithmetic and formulated his “finitist” 
program. This program did indeed adopt a strongly for- 
malistic view, but it did so with the restricted aim of 
solving this particular problem. It is therefore impor- 
tant to stress that Hilbert’s conceptions of geometry 
were, and remained, essentially empiricist and that he 
never regarded his axiomatic analysis of geometry as 
part of an overall formalistic conception of mathemat- 
ics. He considered the axiomatic approach as a tool for 
the conceptual clarification of existing, well-elaborated 
theories, of which geometry provided only the most 
prominent example. 

The implication of Hilbert’s axiomatic approach for 
the concept of proof and of truth in mathematics pro- 
voked strong reactions from some mathematicians, 
and prominently so from frege [VI. 56]. Frege's views 
are closely connected with the changing status of logic 
at the turn of the twentieth century and its gradual pro- 
cess of mathematization and formalization. This pro- 
cess was an outcome of the successive efforts through 
the nineteenth century of boole [VI.43], de Mor- 
gan [VI. 38], Grassmann, Charles S. Peirce, and Ernst 
Schroder at formulating an algebra of logic. The most 
significant step toward a new, formal conception of 
logic, however, came with the increased understanding 
of the role of the logical quantifiers [1.2 §3.2] (univer- 
sal, V, and existential, 3) in the process of formulat- 
ing a modern mathematical proof. This understanding 
emerged in an informal, but increasingly clear, fash- 
ion as part of the process of the rigorization of analy- 
sis and the distancing from visual intuition, especially 
at the hånds of Cauchy, bolzano [VI.28], and Weier- 
strass. It was formally defined and systematically cod- 
ified for the first time by Frege in his 1879 Begriffss- 
chrift. Frege's system, as well as similar ones proposed 
later by Peano and by russell [VI.71], brought to the 
fore a clear distinction between propositional connec- 
tives and quantifiers, as well as between logical symbols 
and algebraic or arithmetic ones. 

Frege formulated the idea of a formal system, in 
which one defines in advance all the allowable sym- 
bols, all the rules that produce well-formed formulas, 
all axioms (i.e., certain preselected, well-formed formu- 
las), and all the rules of inference. In such systems 


any deduction can be checked syntactically — in other 
words, by purely symbolic means. On the basis of such 
systems Frege aimed to produce theories with no log- 
ical gaps in their proofs. This would apply not only to 
analysis and to its arithmetic foundation— the mathe- 
matical helds that provided the original motivation for 
his work— but also to the new systems of geometry that 
were evolving at the time. On the other hånd, in Frege’s 
view the axioms of mathematical theories— even if they 
appear in the formal system merely as well-formed for- 
mulas— embody truths about the world. This is pre- 
cisely the source of his criticism of Hilbert. It is the 
truth of the axioms, asserted Frege, that certihes their 
consistency, rather than the other way around, as 
Hilbert suggested. 

We thus see how foundational research in two sep- 
arate helds — geometry and analysis — was inspired by 
different methodologies and philosophical outlooks, 
but converged at the turn of the twentieth century 
to create an entirely new conception of mathematical 
proof. In this conception a mathematical proof is seen 
as a purely logical construct validated in purely syntac- 
tic terms, independently of any visualization through 
diagrams. This conception has dominated mathematics 
ever since. 

Epilogue: Proof in the Twentieth Century 

The new notion of proof that stabilized at the beginning 
of the twentieth century provided an idealized model— 
broadly accepted to this day— of what should consti- 
tute a valid mathematical argument. To be sure, actual 
proofs devised and published by mathematicians since 
that time are seldom presented as fully formalized 
texts. They typically present a clearly articulated argu- 
ment in a language that is precise enough to convince 
the reader that it could— in principle, and perhaps with 
straightforward (if sustained) effort— be turned into 
one. Throughout the decades, however, some limita- 
tions of this dominant idea have gradually emerged 
and alternative conceptions of what should count as a 
valid mathematical argument have become increasingly 
accepted as part of current mathematical practice. 

The attempt to pursue this idea systematically to its 
full extent led, early on and very unexpectedly, to a 
serious difficulty with the notion of a proof as a com- 
pletely formalized and purely syntactic deductive argu- 
ment. In the early 1920s, Hilbert and his collaborators 
developed a fully fledged mathematical theory whose 
subject matter was “proof,” considered as an object of 
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study in itself. This theory, which presupposed the for- 
mal conception of proof, arose as part of an ambitious 
program for providing a direct, finitistic consistency 
proof of arithmetic represented as a formalized sys- 
tem. Hilbert asserted that, just as the physicist exam- 
ines the physical apparatus with which he carries out 
his experiments and the philosopher engages in a cri- 
tique of reason, so the mathematician should be able 
to analyze mathematical proofs and do so strictly by 
mathematical means. About a decade after the program 
was launched, godel [VI.92] came up with his astonish- 
ing incompleteness theorem [V.18], which famously 
showed that “mathematical truth” and “provability” 
were not one and the same thing. Indeed, in any consis- 
tent, sufficiently rich axiomatic system (including the 
systems typically used by mathematicians) there are 
true mathematical statements that cannot be proved. 
Godel’s work implied that Hilbert’s finitistic program 
was too optimistic, but at the same time it also made 
clear the deep mathematical insights that could be 
obtained from Hilbert’s proof theory. 

A closely related development was the emergence of 
proofs that certain important mathematical statements 
were undecidable. Interestingly, these seemingly neg- 
ative results have given rise to new ideas about the 
legitimate grounds for establishing the truth of such 
statements. For instance, in 1963 Paul Cohen estab- 
lished that the continuum hypothesis [IV.1§5] can 
be neither proved nor disproved in the usual systems 
of axioms for set theory. Most mathematicians sim- 
ply accept this idea and regard the problem as solved 
(even if not in the way that was originally expected), 
but some contemporary set theorists, notably Hugh 
Woodin, maintain that there are good reasons to believe 
that the hypothesis is false. The strategy they follow in 
order to justify this assertion is fundamentally differ- 
ent from the formal notion of proof: they devise new 
axioms, demonstrate that these axioms have very desir- 
able properties, argue that they should therefore be 
accepted, and then show that they imply the negation of 
the continuum hypothesis. (See set theory [IV.l §10] 
for further discussion.) 

A second important challenge came from the ever- 
increasing length of significant proofs appearing in 
various mathematical domains. A prominent example 
was the classification theorem for finite simple 
groups [V.8], whose proof was worked out in many sep- 
arate parts by a large numbers of mathematicians. The 
resul ting arguments, if put together, would reach about 
ten thousand pages, and errors have been found since 


the announcement in the early 1980s that the proof was 
complete. It has always been relatively straightforward 
to fix the errors and the theorem is indeed accepted 
and used by group theorists. Nevertheless, the notion 
of a proof that is too long for a single human being to 
check is a challenge to our conception of when a proof 
should be accepted as such. The more recent, very con- 
spicuous cases of fermat’s last theorem [V.12] and 
the poincaré conjecture [V.28] were hard to survey 
for different reasons: not only were they long (though 
nowhere near as long as the classification of finite sim- 
ple groups), but they were also very difficult. In both 
cases there was a significant interval between the first 
announcement of the proofs and their complete accep- 
tance by the mathematical community because check- 
ing them required enormous efforts by the very few 
people qualified to do so. There is no controversy about 
either of these two breakthroughs, but they do raise an 
interesting sociological problem: if somebody claims to 
have proved a theorem and nobody else is prepared 
to check it carefully (perhaps because, unlike the two 
theorems just mentioned, this one is not important 
enough for another mathematician to be prepared to 
spend the time that it would take), then what is the 
status of the theorem? 

Proofs based on probabilistic considerations have 
also appeared in various mathematical domains, 
including number theory, group theory, and combina- 
torics. It is sometimes possible to prove mathematical 
statements (see, for example, the discussion of random 
primality testing in computational number theory 
[IV.5 §2]), not with complete certainty, but in such a way 
that the probability of error is tiny— at most one in a 
trillion, say. In such cases, we may not have a formal 
proof, but the chances that we are mistaken in consid- 
ering the given statement to be true are probably lower 
than, say, than the chance that there is a significant 
mistake in one of the lengthy proofs mentioned above. 

Another challenge has come from the introduction 
of computer-assisted methods of proof. For instance, 
in 1976 Kenneth Appel and Wolfgang Haken settled a 
famous old problem by proving the four-color the- 
orem [V. 14]. Their proof involved the checking of a 
huge number of different map configurations, which 
they did with the help of a computer. Initially, this 
raised debates about the legitimacy of their proof but 
it quickly became accepted and there are now sev- 
eral proofs of this kind. Some mathematicians even 
believe that computer-assisted and, more importantly, 
computer-generated proofs are the future of the entire 
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discipline. Under this (currently minority) view, our 
present views about what counts as an acceptable 
mathematical proof will soon become obsolete. 

A last point to stress is that many branches of math- 
ematics now contain conjectures that seem to be both 
fundamentally important and out of reach for the fore- 
seeable future. Mathematicians persuaded of the truth 
of such conjectures increasingly undertake the sys- 
tematic study of their consequences, assuming that an 
acceptable proof will one day appear (or at least that the 
conjecture is true). Such conditional results are pub- 
lished in leading mathematical journals and doctoral 
degrees are routinely awarded for them. 

All of these trends raise interesting questions 
about existing conceptions of legitimate mathematical 
proofs, the status of truth in mathematics, and the 
relationship between “pure” and “applied” helds. The 
formal notion of a proof as a string of symbols that 
obeys certain syntactical rules continues to provide 
an ideal model for the principles that underlie what 
most mathematicians see as the essence of their dis- 
cipline. It allows far-reaching mathematical analysis of 
the power of certain axiomatic systems, but at the same 
time it falis short of explaining the changing ways in 
which mathematicians decide what kinds of arguments 
they are willing to accept as legitimate in their actual 
professional practice. 

I thank José Ferreiros and Reviel Netz for useful comments 
on previous versions of this text. 
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II. 7 The Crisis in the Foundations of 
Mathematics 

José Ferreiros 


The foundational crisis is a celebrated affair among 
mathematicians and it has also reached a large non- 
mathematical audience. A well-trained mathematician 
is supposed to know something about the three view- 
points called “logicism,” “formalism,” and “intuition- 
ism” (to be explained below), and about what godel’s 
incompleteness results [V.18] tellus about the status 
of mathematical knowledge. Professional mathemati- 
cians tend to be rather opinionated about such top- 
ics, either dismissing the foundational discussion as 
irrelevant— and thus siding with the winning party— 
or defending, either as a matter of principle or as an 
intriguing option, some form of revisionist approach 
to mathematics. But the real outlines of the histori- 
cal debate are not well-known and the subtler philo- 
sophical issues at stake are often ignored. Here we 
shall mainly discuss the former, in the hope that this 
will help bring the main conceptual issues into sharper 
focus. 

The foundational crisis is usually understood as a 
relatively localized event in the 1920s, a heated debate 
between the partisans of “classical” (meaning late-nine- 
teenth-century) mathematics, led by hilbert [VI.63], 
and their critics, led by brouwer [VI.75], who advo- 
cated strong revision of the received doctrines. There 
is, however, a second, and in my opinion very impor- 
tant, sense in which the “crisis” was a long and global 
process, indistinguishable from the rise of modern 
mathematics and the philosophical and methodologi- 
cal issues it created. This is the standpoint from which 
the present account has been written. 

Within this longer process one can still pick out some 
noteworthy intervals. Around 1870 there were many 
discussions about the acceptability of non-Euclidean 
geometries, and also about the proper foundations of 
complex analysis and even the real numbers. Early in 
the twentieth century there were debates about set 
theory, about the concept of the continuum, and about 
the role of logic and the axiomatic method versus 
the role of intuition. By about 1925 there was a cri- 
sis in the proper sense, during which the main opin- 
ions in these debates were developed and turned into 
detailed mathematical research projects. And in the 
1930s godel [VI.92] proved his incompleteness results, 
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which could not be assimilated without some cherished 
beliefs being abandoned. Let us analyze some of these 
events and issues in greater detail. 

1 Early Foundational Questions 

There is evidence that in 1899 Hilbert endorsed the 
viewpoint that came to be known as logicism. Logicism 
was the thesis that the basic concepts of mathemat- 
ics are definable by means of logical notions, and that 
the key principles of mathematics are deducible from 
logical principles alone. 

Over time this thesis has become unclear, based as it 
seems to be on a fuzzy and immature conception of the 
scope of logical theory. But historically speaking logi- 
cism was a neat intellectual reaction to the rise of mod- 
ern mathematics, and particularly to the set-theoretic 
approach and methods. Since the majority opinion was 
that set theory is just a part of (refined) logic, 1 this the- 
sis was thought to be supported by the faet that the 
theories of natural and real numbers can be derived 
from set theory, and also by the inereasingly important 
role of set-theoretic methods in algebra and in real and 
complex analysis. 

Hilbert was following dedekind [VI. 50] in the way 
he understood mathematics. For us, the essence of 
Hilbert’s and Dedekind’s early logicism is their self- 
conscious endorsement of certain modern methods, 
however daring they seemed at the time. These meth- 
ods had emerged gradually during the nineteenth cen- 
tury, and were particularly associated with Gottingen 
mathematics (gauss [VI.26] and dirichlet [VI.36]); 
they experienced a crucial turning point with rie- 
mann’s [VI.49] novel ideas, and were developed fur- 
ther by Dedekind, cantor [VI.54], Hilbert, and other, 
lesser figures. Meanwhile, the influential Berlin school 
of mathematics had opposed this new trend, kro- 
necker [VI.48] head-on and weierstrass [VI.44] more 
subtly. (The name of Weierstrass is synonymous with 
the introduction of rigor in real analysis, but in faet, as 
will be indicated below, he did not favor the more mod- 
ern methods elaborated in his time.) Mathematicians in 
Paris and elsewhere also harbored doubts about these 
new and radical ideas. 

The most characteristic traits of the modern ap- 
proach were: 


1. One should mention that key figures like Riemann and Cantor 
disagreed (see Ferreiros 1999). The “majority” included Dedekind, 
peano [VI.62], Hilbert, russell [VI.71], and others. 


(i) acceptance of the notion of an “arbitrary” funetion 
proposed by Dirichlet; 

(ii) a wholehearted acceptance of infmite sets and the 
higher infmite; 

(iii) a preference “to put thoughts in the place of 
calculations” (Dirichlet), and to concentrate on 
“structures” characterized axiomatically; and 

(iv) a reliance on “purely existential” methods of 
proof. 

An early and influential example of these traits was 
Dedekind’s approach (1871) to algebraic number 
theory [IV.3]— his set-theoretic definition of number 
fields [III.65] and ideals [III.83 §2], and the methods 
by which he proved results such as the fundamen- 
tal theorem of unique decomposition. In a remark- 
able departure from the number-theoretic tradition, 
Dedekind studied the factorization properties of alge- 
braic integers in terms of ideals, which are certain infi- 
nite sets of algebraic integers. Using this new abstract 
concept, together with a suitable definition of the 
product of two ideals, Dedekind was able to prove in 
full generality that, within any ring of algebraic inte- 
gers, ideals possess a unique decomposition into prime 
ideals. 

The influential algebraist Kronecker complained that 
Dedekind’s proofs do not enable us to calculate, in 
a particular case, the relevant divisors or ideals: that 
is, the proof was purely existential. Kronecker’s view 
was that this abstract way of working, made possible 
by the set-theoretic methods and by a concentration 
on the algebraic properties of the structures involved, 
was too remote from an algorithmic treatment— that is, 
from so-called constructive methods. But for Dedekind 
this complaint was misguided: it merely showed that 
he had succeeded in elaborating the principle “to put 
thoughts in the place of calculations,” a principle that 
was also emphasized in Riemann’s theory of complex 
funetions. Obviously, concrete problems would require 
the development of more delicate computational tech- 
niques, and Dedekind contributed to this in several 
papers. But he also insisted on the importance of a 
general, conceptual theory. 

The ideas and methods of Riemann and Dedekind 
became better known through publications of the 
period 1867-72. These were found particularly shock- 
ing because of their very explicit defense of the view 
that mathematical theories ought not to be based 
upon formulas and calculations — they should always 
be based on clearly formulated general concepts, with 
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analytical expressions or calculating devices relegated 
to the further development of the theory. 

To explain the contrast, let us consider the par- 
ticularly clear case of the opposition between Rie- 
mann’s and Weierstrass’s approaches to function 
theory. Weierstrass opted systematically for explicit 
representations of analytic (or holomorphic [1.3 §5.6]) 
functions by means of power series of the form 
Xn=o a n(z - a) n , connected with each other by ana- 
lytic continuation [1.3 §5.6]. Riemann chose a very 
different and more abstract approach, defining a func- 
tion to be analytic if it satisfies the cauchy-riemann 
DIFFERENTIABILITY CONDITIONS [1.3 §5.6]. 2 This neat 
conceptual definition appeared objectionable to Weier- 
strass, as the class of differentiable functions had never 
been carefully characterized (in terms of series repre- 
sentations, for example). Exercising his famous critical 
abilities, Weierstrass offered examples of continuous 
functions that were nowhere differentiable. 

It is worth mentioning that, in preferring infinite 
series as the key means for research in analysis and 
function theory, Weierstrass remained doser to the 
old eighteenth-century idea of a function as an ana- 
lytical expression. On the other hånd, Riemann and 
Dedekind were always in favor of Dirichlet’s abstract 
idea of a function / as an “arbitrary” way of associ- 
ating with each x some y = fix). (Previously it had 
been required that y should be expressed in terms 
of x by means of an explicit formula.) In his let- 
ters, Weierstrass criticized this conception of Dirich- 
let’s as too general and vague to constitute the starting 
point for any interesting mathematical development. 
He seems to have missed the point that it was in faet 
just the right framework in which to define and ana- 
lyze general concepts such as continuity [1.3 §5.2] 
and integration [1.3 §5.5]. This framework came to be 
called the conceptual approach in nineteenth-century 
mathematics. 

Similar methodological debates emerged in other 
areas too. In a letter of 1870, Kronecker went as far 
as saying that the Bolzano-Weierstrass theorem was 
an “obvious sophism,” promising that he would offer 
counterexamples. The Bolzano-Weierstrass theorem, 
which States that an infinite bounded set of real num- 
bers has an accumulation point, was a cornerstone 


2. Riemann determined particular functions by a series of indepen- 
dent traits such as the associated riemann surface [III.81] and the 
behavior at singular points. These traits determined the function via a 
certain variational principle (the “Dirichlet principle”), which was also 
criticized by Weierstrass, who gave a counterexample to it. Hilbert and 
Kneser would later reformulate and justify the principle. 
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of classical analysis, and was emphasized as such by 
Weierstrass in his famous Berlin leetures. The problem 
for Kronecker was that this theorem rests entirely on 
the completeness axiom for the real numbers (which, 
in one version, States that every sequence of nonempty 
nested closed intervals in R has a nonempty intersec- 
tion). The real numbers cannot be constructed in an 
elementary way from the rational numbers: one has 
to make heavy use of infinite sets (such as the set of 
all possible “Dedekind cuts,” which are subsets C c Q 
such that p g C whenever p and q are rational num- 
bers such that p < q and q e C). To put it another 
way: Kronecker was drawing attention to the problem 
that, very often, the accumulation point in the Bolzano- 
Weierstrass theorem cannot be constructed by elemen- 
tary operations from the rational numbers. The classi- 
cal idea of the set of real numbers, or “the continuum,” 
already contained the seeds of the nonconstructive 
ingredient in modern mathematics. 

Later on, in around 1890, Hilbert’s work on invari- 
ant theory led to a debate about his purely existen- 
tial proof of another basic result, the basis theorem, 
which States (in modern terminology) that every ideal 
in a polynomial ring is finitely generated. Paul Gor- 
dan, famous as the “king” of invariants for his heavily 
algorithmic work on the topic, remarked humorously 
that this was “theology,” not mathematics! (He appar- 
ently meant that, because the proof was purely existen- 
tial, rather than constructive, it was comparable with 
philosophical proofs of the existence of God.) 

This early foundational debate led to a gradual clari- 
fication of the opposing viewpoints. Cantor’s proofs in 
set theory also became quintessential examples of the 
modern methodology of existential proof. He offered 
an explicit defense of the higher infinite and modern 
methods in a paper of 1883, which was peppered with 
hidden attacks on Kronecker’s views. Kronecker in turn 
criticized Dedekind’s methods publiely in 1882, spoke 
privately against Cantor, and in 1887 published an 
attempt to spell out his foundational views. Dedekind 
replied with a detailed set-theoretic (and “thus,” for 
him, logicistic) theory of the natural numbers in 1888. 

The early round of criticism ended with an apparent 
victory for the modern camp, which enrolled new and 
powerful allies such as Hurwitz, minkowski [VI.64], 
Hilbert, Volterra, Peano, and hadamard [VI.65], and 
which was defended by influential figures such as klein 
[VI. 5 7]. Although Riemannian function theory was still 
in need of further refinement, recent developments 
in real analysis, number theory, and other fields were 
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showing the power and promise of the modern meth- 
ods. During the 1890s, the modern viewpoint in gen- 
eral, and logicism in particular, enjoyed great expan- 
sion. Hilbert developed the new methodology into the 
axiomatic method, which he used to good effect in his 
treatment of geometry (1899 and subsequent editions) 
and of the real number system. 

Then, dramatically, came the so-called logical para- 
doxes, discovered by Cantor, Russell, Zermelo, and oth- 
ers, which will be discussed below. These were of two 
kinds. On the one hånd, there were arguments show- 
ing that assumptions that certain sets exist lead to 
contradictions. These were later called the set-theoretic 
paradoxes. On the other, there were arguments, later 
known as the semantic paradoxes, which showed up 
difficulties with the notions of truth and definability. 
These paradoxes completely destroyed the attractive 
view of recent developments in mathematics that had 
been proposed by logicism. Indeed, the heyday of logi- 
cism came before the paradoxes, that is, before 1900; 
it subsequently enjoyed a revival with Russell and his 
“theory of types,” but by 1920 logicism was of inter- 
est more to philosophers than to mathematicians. How- 
ever, the divide between advocates of the modern meth- 
ods and constructivist critics of these methods was 
there to stay. 

2 Around 1900 

Hilbert opened his famous list of mathematical prob- 
lems at the Paris International Congress of Mathematics 
of 1900 with Cantor’s continuum problem [IV.l §5], 
a key question in set theory, and with the problem 
of whether every set can be well-ordered. His second 
problem amounted to establishing the consistency of 
the notion of the set R of real numbers. It was not by 
chance that he began with these problems: rather, it 
was a way of making a clear statement about how math- 
ematics should be in the twentieth century. Those two 
problems, and the axiom of choice [III. 1] employed 
by Hilbert’s young colleague Zermelo to show that R 
(the continuum) can be well-ordered, are quintessential 
examples of the traits (i)-(iv) that were listed above. It 
is little wonder that less daring minds objected and 
revived Kronecker’s doubts, as can be seen in many 
publications of 1905-6. This brings us to the next stage 
of the debate. 


2.1 Paradoxes and Consistency 

In a remarkable turn of events, the champions of mod- 
ern mathematics stumbled upon arguments that cast 
new doubts on its cogency. In around 1896, Cantor dis- 
covered that the seemingly harmless concepts of the 
set of all ordinals and the set of all cardinals led to 
contradictions. In the former case the contradiction 
is usually called the Burali-Forti paradox; the latter is 
the Cantor paradox. The assumption that all transft- 
nite ordinals form a set leads, by Cantor’s previous 
results, to the result that there is an ordinal that is less 
than itself— and similarly for cardinals. Upon learning 
of these paradoxes, Dedekind began to doubt whether 
human thought is completely rational. Even worse, in 
1901-2 Zermelo and Russell discovered a very elemen- 
tary contradiction, now known as RusselTs paradox or 
sometimes as the Zermelo-Russell paradox, which will 
be discussed in a moment. The untenability of the 
previous understanding of set theory as logic became 
clear, and there began a new period of instability. But 
it should be said that only logicists were seriously 
upset by these arguments: they were presented with 
contradictions in their theories. 

Let us explain the importance of the Zermelo-Russell 
paradox. From Riemann to Hilbert, many authors 
accepted the principle that, given any well-defmed log- 
ical or mathematical property, there exists a set of all 
objects satisfying that property. In symbols: given a 
well-defmed property p, there exists another object, 
the set {x : p(x)\. For example, corresponding to the 
property of “being a real number” (which is expressed 
formally by Hilbert’s axioms) there is the set of all real 
numbers; corresponding to the property of “being an 
ordinal” there is the set of all ordinals; and so on. This 
is called the comprehension principle, and it constitutes 
the basis for the logicistic understanding of set theory, 
often called naive set theory, although its naivete is 
only clear with hindsight. The principle was thought 
of as a basic logical law, so that all of set theory was 
merely a part of elementary logic. 

The Zermelo-Russell paradox shows that the com- 
prehension principle is contradictory, and it does so 
by formulating a property that seems to be as basic 
and purely logical as possible. Let p(x) be the prop- 
erty x i x (bearing in mind that negation and mem- 
bership were assumed to be purely logical concepts). 
The comprehension principle yields the existence of 
the set R = {x : x t x], but this leads quickly to a 
contradiction: if R e R, then R $ R (by the definition 
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of R), and similarly, if R £ R, then R e R. Hilbert (like 
his older colleague frege [VI. 56]) was led to abandon 
logicism, and even wondered whether Kronecker might 
have been right all along. Eventually he concluded that 
set theory had shown the need to refine logical theory. 
It was also necessary to establish set theory axiomat- 
ically, as a basic mathematical theory based on math- 
ematical (not logical) axioms, and Zermelo undertook 
this task. 

Hilbert famously advocated that to claim that a set of 
mathematical objects exists is tantamount to proving 
that the corresponding axiom system is consistent— 
that is, free of contradictions. The documentary evi- 
dence suggests that Hilbert came to this celebrated 
principle in reaction to Cantor’s paradoxes. His rea- 
soning may have been that, instead of jumping directly 
from well-defined concepts to their corresponding sets, 
one had first to prove that the concepts are logically 
consistent. For example, before one could accept the set 
of all real numbers, one should prove the consistency 
of Hilbert’s axiom system for them. Hilbert’s principle 
was a way of removing any metaphysical content from 
the notion of mathematical existence. This view, that 
mathematical objects had a sort of “ideal existence” in 
the realm of thought rather than an independent meta- 
physical existence, had been anticipated by Dedekind 
and Cantor. 

The “logical” paradoxes included not only the ones 
that go by the names of Burali-Forti, Cantor, and Rus- 
sell, but also many semantic paradoxes formulated by 
Russell, Richard, Konig, Grelling, etc. (Richard’s Para- 
dox will be discussed below.) Much confusion emerged 
from the abundance of different paradoxes, but one 
thing is clear: they played an important role in promot- 
ing the development of modern logic and convincing 
mathematicians of the need for strictly formal presen- 
tation of their theories. Only when a theory has been 
stated within a precise formal language can one disre- 
gard the semantic paradoxes, and even formulate the 
distinction between these and the set-theoretic ones. 

2.2 Predicativity 

When the hooks of Frege and Russell made the para- 
doxes of set theory widely known to the mathematical 
community in 1903, poincaré [VI.61] used them to put 
forward criticisms of both logicism and formalism. 

His analysis of the paradoxes led him to coin an 
important new notion, predicativity, and maintain that 
impredicative definitions should be avoided in mathe- 
matics. Informally, a definition is impredicative when 
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it introduces an element by reference to a totality that 
already contains that element. A typical example is 
the following: Dedekind defines the set N of natural 
numbers as the intersection of all sets that contain 
1 and are closed under an injective function a such 
that 1 £ cr(N). (The function a is called the successor 
function.) His idea was to characterize N as minimal, 
but in his procedure the set N is first introduced by 
appeal to a totality of sets that should already include 
N itself. This kind of procedure appeared unacceptable 
to Poincaré (and also to Russell), especially when the 
relevant object can be specified only by reference to 
the more embracing totality. Poincaré found examples 
of impredicative procedures in each of the paradoxes 
he studied. 

Take, for instance, Richard’s paradox, which is one 
of the linguistic or semantic paradoxes (where, as we 
said, the notions of truth and definability are promi- 
nent). One begins with the idea of definable real num- 
bers. Because definitions must be expressed in a certain 
language by finite expressions, there are only count- 
ably many definable numbers. Indeed, we can explic- 
itly count the definable real numbers by listing them in 
alphabetical order of their definitions. (This is known 
as the lexicographic order.) Richard’s idea was to apply 
to this list a diagonal process, of the kind used by Can- 
tor to prove that R is not countable [III.il]. Fet the 

definable numbers be a\ , U2, 0.3, Define a new num- 

ber r in a systematic way, making sure that the nth 
decimal digit of r is different from the nth decimal 
digit of a n . (For example, let the nth digit of r be 2 
unless the nth digit of a n is 2, in which case let the 
nth digit of r be 4.) Then r cannot belong to the set 
of definable numbers. But in the course of this con- 
struction, the number r has just been defined in finitely 
many words! Poincaré would ban impredicative defini- 
tions and would therefore prevent the introduction of 
the number r, since it was defined with reference to the 
totality of all definable numbers. 3 

In this kind of approach to the foundations of math- 
ematics, all mathematical objects (beyond the natural 
numbers) must be introduced by explicit definitions. 
If a definition refers to a presumed totality of which 
the object being defined is itself a member, we are 
involved in a circle: the object itself is then a con- 
stituent of its own definition. In this view, “definitions” 


3. The modem solution is to establish mathematical definitions 
within a well-determined formal theory, whose language and expres- 
sions are fixed to begin with. Richard’s paradox takes advantage of an 
ambiguity as to what the available means of definition are. 
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must be predicative: one refers only to totalities that 
have already been established before the object one is 
defining. Important authors such as Russell and weyl 
[VI.80] accepted this point of view and developed it. 

Zermelo was not convinced, arguing that impredica- 
tive definitions were often used unproblematically, not 
only in set theory (as in Dedekind’s definition of N, for 
example), but also in classical analysis. As a particular 
example, he cited cauchy’s [VI.29] proof of the fun- 
damental THEOREM OF ALGEBRA [V.15], 4 but a Sim- 
pler example of impredicative definition is the least 
upper bound in real analysis. The real numbers are 
not introduced separately, by explicit predicative defi- 
nitions of each one of them; rather, they are introduced 
as a completed whole, and the particular way in which 
the least upper bound of an infmite bounded set of 
reals is singled out becomes impredicative. But Zermelo 
insisted that these definitions are innocuous, because 
the object being defined is not “created” by the defini- 
tion; it is merely singled out (see his paper of 1908 in 
van Heijenoort (1967, pp. 183-98)). 

Poincaré’s idea of abolishing impredicative defini- 
tions became important for Russell, who incorporated 
it as the “vicious circle principle” in his influential 
theory of types. Type theory is a system of higher-order 
logic, with quantification over properties or sets, over 
relations, over sets of sets, and so on. Roughly speak- 
ing, it is based on the idea that the elements of any 
set should always be objects of a certain homogeneous 
type. For instance, we can have sets of “individuals,” 
such as {a, b}, or sets of sets of individuals, such as 
{{a},{a,b}}, but never a “mixed” set like {a, {a, b}}. 
Russell’ s version of type theory became rather compli- 
cated because of the so-called ramification he adopted 
in order to avoid impredicativity. This system, together 
with axioms of infinity, choice, and “reducibility” (a 
surprisingly ad hoc means to “collapse” the ramifica- 
tion), sufficed for the development of set theory and the 
number systems. Thus it became the logical basis for 
the renowned Principia Mathematica by Whitehead and 
Russell (1910-13), in which they carefully developed a 
foundation for mathematics. 

Type theory remained the main logical system until 
about 1930, but under the form of simple type theory 


4. Cauchy’s reasoning was clearly nonconstructive, or “purely exis- 
tential’' as we have been saying. In order to show that the polynomial 
must have one root, Cauchy studled the absolute value of the polyno- 
mial, which has a global minimum a. This global minimum is impred- 
icatively defined. Cauchy assumed that it was positive, and from this 
he derived a contradiction. 


(that is, without ramification), which, as Chwistek, Ram- 
sey, and others realized, suffices for a foundation in 
the style of Principia. Ramsey proposed arguments that 
were aimed at eliminating worries about impredicativ- 
ity, and he tried to justify the other existence axioms 
of Principia — the axiom of infinity and the axiom of 
choice — as logical principles. But his arguments were 
inconclusive. Russell’s attempt to rescue logicism from 
the paradoxes remained unconvincing, except to some 
philosophers (especially members of the Vienna Circle). 

Poincaré’s suggestions also became a key principle 
for the interesting foundational approach proposed by 
Weyl in his book Das Kontinuum (1918). The main idea 
was to accept the theory of the natural numbers as they 
were conventionally developed using classical logic, 
but to work predicatively from there on. Thus, unlike 
Brouwer, Weyl accepted the principle of the excluded 
middle. (This, and Brouwer’s views, will be discussed in 
the next section.) However, the full system of the real 
numbers was not available to him: in his system the set 
R was not complete and the Bolzano-Weierstrass theo- 
rem failed, which meant that he had to devise sophisti- 
cated replacements for the usual derivations of results 
in analysis. 

The idea of predicative foundations for mathematics, 
in the style of Weyl, has been carefully developed in 
recent decades with noteworthy results (see Feferman 
1998). Predicative systems lie between those that coun- 
tenance all of the modern methodology and the more 
stringent constructivistic systems. This is one of sev- 
eral foundational approaches that do not fit into the 
conventional but by now outdated triad of logicism, 
formalism, and intuitionism. 

2.3 Choices 

As important as the paradoxes were, their impact on 
the foundational debate has often been overstated. One 
frequently finds accounts that take the paradoxes as 
the real starting point of the debate, in strong con- 
trast with our discussion in section 1. But even if we 
restrict our attention to the first decade of the twenti- 
eth century, there was another controversy of equal, 
if not greater, importance: the arguments that sur- 
rounded the axiom of choice and Zermelo’s proof of 
the well-ordering theorem. 

Recall from section 2.1 that the association between 
sets and their defining properties was at the time 
deeply ingrained in the minds of mathematicians and 
logicians (via the contradictory principle of comprehen- 
sion). The axiom of choice (AC) is the principle that, 
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given any infinite family of disjoint nonempty sets, 
there is a set, known as a choice set, that contains 
exactly one element from each set in the family. The 
problem with this, said the critics, is that it merely 
stipulates the existence of the choice set and does not 
give a defining property for it. Indeed, when it is possi- 
ble to characterize the choice set explicitly, then the 
use of AC is avoidable! But in the case of Zermelo’s 
well-ordering theorem it is essential to employ AC. The 
required well-ordering of R “exists” in the ideal sense of 
Cantor, Dedekind, and Hilbert, but it seemed clear that 
it was completely out of reach from any constructivist 
perspective. 

Thus, the axiom of choice exacerbated obscurities in 
previous conceptions of set theory, forcing mathemati- 
cians to introduce much-needed clarifications. On the 
one hånd, AC was nothing but an explicit statement 
of previous views about arbitrary subsets, and yet, on 
the other, it obviously clashed with strongly held views 
about the need to explicitly define infinite sets by prop- 
erties. The stage was set for deep debate. The discus- 
sions about this particular topic contributed more than 
anything else to a clarification of the existential impli- 
cations of modern mathematical methods. It is instruc- 
tive to know that borel [VI.70], Baire, and lebesgue 
[VI.72], who became critics, had all relied on AC in less 
obvious ways in order to prove theorems of analysis. 
Not by chance, the axiom was suggested to Zermelo 
by an analyst, Erhard Schmidt, who was a student of 
Hilbert. 5 

After the publication of Zermelo’s proof, an intense 
debate developed throughout Europe. Zermelo was 
spurred on to work out the foundations of set theory 
in an attempt to show that his proof could be devel- 
oped within an unexceptionable axiom system. The out- 
come was his famous axiom system [IV.l §3], a master- 
piece that emerged from careful analysis of set theory 
as it was historically given in the contributions of Can- 
tor and Dedekind and in Zermelo’s own theorem. With 
some additions due to Fraenkel and von neumann 
[VI.91] (the axioms of replacement and regularity) and 
the major innovation proposed by Weyl and skolem 
[VI.81] (to formulate it within first-order logic, i.e., 
quantifying over individuals, the sets, but not over their 
properties), the axiom system became in the 1920s the 
one that we now know. 


5. One may still gain much insight by reading the letters exchanged 
by the French analysts in 1905 (see Moore 1982; Ewald 1996) and Zer- 
melo’s elever arguments in his second 1908 proof of well-ordering 
(van Heijenoort 1967). 
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The ZFC system (this stands for “Zermelo-Fraenkel 
with choice”) codifies the key traits of modern math- 
ematical methodology, offering a satisfactory frame- 
work for the development of mathematical theories 
and the conduct of proofs. In particular, it includes 
strong existence principles, allows impredicative def- 
initions and arbitrary funetions, warrants purely exis- 
tential proofs, and makes it possible to define the main 
mathematical structures. It thus exhibits all the ten- 
dencies (i)-(iv) mentioned in section 1. Zermelo’s own 
work was completely in line with Hilbert’s informal 
axiomatizations of about 1900, and he did not for- 
get to promise a proof of consistency. Axiomatic set 
theory, whether in the Zermelo-Fraenkel presentation 
or the von Neumann-Bernays-Godel version, is the sys- 
tem that most mathematicians regard as the working 
foundation for their discipline. 

As of 1910, the contrast between Russell’s type 
theory and Zermelo’s set theory was strong. The for- 
mer system was developed within formal logic, and its 
point of departure (albeit later compromised for prag- 
matic reasons) was in line with predicativism; in order 
to derive mathematics, the system needed the existen- 
tial assumptions of infinity and choice, but these were 
rhetorically treated as tentative hypotheses rather than 
outright axioms. The latter system was presented infor- 
mally, adopted the impredicative standpoint whole- 
heartedly, and asserted as axioms strong existential 
assumptions that were sufficient to derive all of classi- 
cal mathematics and Cantor’s theory of the higher infi- 
nite. In the 1920s the separation diminished greatly, 
especially with respect to the first two traits just indi- 
cated. Zermelo’s system was perfeeted and formulated 
within the language of modern formal logic. And the 
Russellians adopted simple type theory, thus accept- 
ing the impredicative and “existential” methodology of 
modern mathematics. This is often given the (poten- 
tially confusing) term “Platonism”: the objects that the 
theory refers to are treated as if they were independent 
of what the mathematician can actually and explicitly 
define. 

Meanwhile, back in the first decade of the twentieth 
century, a young mathematician in the Netherlands was 
beginning to find his way toward a philosophically col- 
ored version of constructivism. Brouwer presented his 
strikingly peculiar metaphysical and ethical views in 
1905, and started to elaborate a corresponding founda- 
tion for mathematics in his thesis of 1907. His philos- 
ophy of “intuitionism” derived from the old metaphys- 
ical view that individual consciousness is the one and 
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only source of knowledge. This philosophy is perhaps 
of little interest in itself, so we shall concentrate here 
on Brouwer’s constructivistic principles. In the years 
around 1910, Brouwer became a renowned mathemati- 
cian, with crucial contributions to topology such as his 
fixed-point theorem [V. 1 3]. By the end of World War I, 
he started to publish detailed elaborations of his foun- 
dational ideas, helping to create the famous “crisis,” to 
which we now turn. He was also successful in establish- 
ing the customary (but misleading) distinction between 
formalism and intuitionism. 

3 The Crisis in a Strict Sense 

In 1921, the Mathematische Zeitschrift published a 
paper by Weyl in which the famous mathematician, who 
was a disciple of Hilbert, openly espoused intuitionism 
and diagnosed a “crisis in the foundations” of math- 
ematics. The crisis pointed toward a “dissolution” of 
the old State of analysis, by means of Brouwer’s “revo- 
lution.” Weyl’s paper was meant as a propaganda pam- 
phlet to rouse the sleepers, and it certainly did. Hilbert 
answered in the same year, accusing Brouwer and Weyl 
of attempting a “putsch” aimed at establishing “dic- 
tatorship å la Kronecker” (see the relevant papers in 
Mancosu (1998) and van Heijenoort (1967)). The foun- 
dational debate shifted dramatically toward the battie 
between Hilbert’ s attempts to justify “classical” math- 
ematics and Brouwer’s developing reconstruction of a 
much-reformed intuitionistic mathematics. 

Why was Brouwer “revolutionary”? Up to 1920 the 
key foundational issues had been the acceptability of 
the real numbers and, more fundamentally, of the 
impredicativity and strong existential assumptions of 
set theory, which supported the higher infinite and 
the unrestricted use of existential proofs. Set theory 
and, by implication, classical analysis had been crit- 
icized for their reliance on impredicative definitions 
and for their strong existential assumptions (in partic- 
ular, the axiom of choice, of which extensive use was 
made by sierpinski [VI.77] in 1918). Thus, the debate 
in the first two decades of the twentieth century was 
mainly about which principles to accept when it came 
to defining and establishing the existence of sets and 
subsets. A key question was, can one make rigorous the 
vague idea behind talk of “arbitrary subsets”? The most 
coherent reactions had been Zermelo’s axiomatization 
of set theory and Weyl's predicative system in Das 
Kontinuum. (The Principia Mathematica of Whitehead 
and Russell was an unsuccessful compromise between 
predicativism and classical mathematics.) 


Brouwer, however, brought new and even more basic 
questions to the fore. No one had questioned the tra- 
ditional ways of reasoning about the natural numbers: 
classical logic, in particular the use of quantifiers and 
the principle of the excluded middle, had been used in 
this context without hesitation. But Brouwer put for- 
ward principled critiques of these assumptions and 
started developing an alternative theory of analysis 
that was much more radical than Weyl’s. In doing so, he 
came upon a new theory of the continuum, which finally 
enticed Weyl and made him announce the coming of a 
new age. 

3.1 Intuitionism 

Brouwer began the systematic development of his 
views with two papers on “intuitionistic set theory,” 
written in German and published in 1918 and 1919 by 
the Verhandelingen of the Dutch Academy of Sciences. 
These contributions were part of what he regarded as 
the “Second Act” of intuitionism. The “First Act” (from 
1907) had been his emphasis on the intuitive founda- 
tions of mathematics. Already Klein and Poincaré had 
insisted that intuition has an inescapable role to play 
in mathematical knowledge: as important as logic is in 
proofs and in the development of mathematical theory, 
mathematics cannot be reduced to pure logic; theories 
and proofs are of course organized logically, but their 
basic principles (axioms) are grounded in intuition. But 
Brouwer went beyond them and insisted on the abso- 
lute independence of mathematics from language and 
logic. 

From 1907, Brouwer rejected the principle of the 
excluded middle (PEM), which he regarded as equiva- 
lent to Hilbert’s conviction that all mathematical prob- 
lems are solvable. PEM is the logical principle that the 
statement p v -> p (that is, either p or not p) must always 
be true, whatever the proposition p may be. (For exam- 
ple, it follows from PEM that either the decimal expan- 
sion of tt contains infinitely many sevens or it contains 
only fimtely many sevens, even though we do not have 
a proof of which.) Brouwer held that our customary log- 
ical principles were abstracted from the way we dealt 
with subsets of a finite set, and that it was wrong to 
apply them to infinite sets as well. After World War I he 
started the systematic reconstruction of mathematics. 

The intuitionist position is that one can only State “p 
or g" when one can give either a constructive proof of 
p or a constructive proof of q. This standpoint has the 
consequence that proofs by contradiction ( reductio ad 
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absurdum) are not valid. Consider Hilbert’s first proof 
of his basis theorem (section 1), achieved by reductio: 
he showed that one can derive a contradiction from the 
assumption that the basis is infinite, and from this he 
concluded that the basis is finite. The logic behind this 
procedure is that we start from a concrete instance of 
PEM, p v -ip, show that ->p is untenable, and conclude 
that p must be true. But constructive mathematics asks 
for explicit procedures for constructing each object that 
is assumed to exist, and explicit constructions behind 
any mathematical statement. Similarly, we have men- 
tioned before (section 2.1) Cauchy’s proof of the fun- 
damental theorem of algebra, as well as many proofs in 
real analysis that invoke the least upper bound. All of 
these proofs are invalid for a constructivist, and sev- 
eral mathematicians have tried to save the theorems 
by finding constructivist proofs for them. For instance, 
both Weyl and Rneser worked on constructivist proofs 
for the fundamental theorem of algebra. 

It is easy to give instances of the use of PEM that a 
constructivist will not accept: one just has to apply it 
to any unsolved mathematical problem. For example, 
Catalan’s constant is the number 
„ = y t " 1 ) 11 
“o (2n + i) 2 ' 

It is not known whether K is transcendental, so if p is 
the statement “Catalan’s constant is transcendental,” 
then a constructivist will not accept that p is either true 
or false. 

This may seem odd, or even obviously wrong, until 
one realizes that constructivists have a different view 
about what truth is. For a constructivist, to say that a 
proposition is true simply means that we can prove it 
in accordance with the stringent methods that we are 
discussing; to say that it is false means that we can 
actually exhibit a counterexample to it. Since there is 
no reason to suppose that every existence statement 
has either a strict constructivist proof or an explicit 
counterexample, there is no reason to believe PEM (with 
this notion of truth). Thus, in order to establish the 
existence of a natural number with a certain prop- 
erty, a proof by reductio ad absurdum is not enough. 
Existence must be shown by explicit determination or 
construction if you want to persuade a constructivist. 

Notice also how this viewpoint implies that math- 
ematics is not timeless or ahistorical. It was only in 
1882 that Lindemann proved that tt is a transcenden- 
tal number [III.43]. Since that date, it has been pos- 
sible to assign a truth value to statements that were 
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neither true nor false before, according to intuition- 
ists. This may seem paradoxical, but it was just right 
for Brouwer, since in his view mathematical objects 
are mental constructions and he rejected as “meta- 
physics” the assumption that they have an independent 
existence. 

In 1918, Brouwer replaced the sets of Cantor and 
Zermelo by constructive counterparts, which he would 
later call “spreads” and “species.” A species is basically 
a set that has been defined by a characteristic prop- 
erty, but with the proviso that every element has been 
previously and independently defined by an explicit 
construction. In particular, the definition of any given 
species will be strictly predicative. 

The concept of a spread is particularly characteris- 
tic of intuitionism, and it forms the basis for Brouwer’s 
definition of the continuum. It is an attempt to avoid 
idealization and do justice to the temporal nature 
of mathematical constructions. Suppose, for example, 
that we wish to define a sequence of rational num- 
bers that gives better and better approximations to 
the square root of 2. In classical analysis, one con- 
ceives of such sequences as existing in their entirety, 
but Brouwer defined a notion that he called a choice 
sequence, which pays more attention to how they might 
be produced. One way to produce them is to give a rule, 
such as the recurrence relation x n +i = (x£ + 2)/2x n 
(and the initial condition x\ = 2). But another is to 
make less rigidly determined choices that obey certain 
constraints: for instance, one might insist that x n has 
denominator n and that x\ differs from 2 by at most 
100/n, which does not determine x n uniquely but does 
ensure that the sequence produces better and better 
approximations to V2. 

A choice sequence is therefore not required to be 
completely specified from the outset, and it can involve 
choices that are freely made by the mathematician at 
different moments in time. Both these features make 
choice sequences very different from the sequences 
of classical analysis: it has been said that intuitionist 
mathematics is “mathematics in the making.” By con- 
trast, classical mathematics is marked by a kind of 
timeless objectivity, since its objects are assumed to 
be fully determined in themselves and independent of 
the thinking processes of mathematicians. 

A spread has choice sequences as its elements— it is 
something like a law that regulates how the sequences 
are constructed. 6 For instance, one could take a spread 


6. More precisely, a spread is defined by means of two laws; see 
Heyting (1956), or more recently van Atten (2003), for further details 
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that consisted of all choice sequences that began in 
some particular way, and such a spread would repre- 
sent a segment — in general, spreads do not represent 
isolated elements, but continuous domains. By using 
spreads whose elements satisfy the Cauchy condition, 
Brouwer offered a new mathematical conception of the 
continuum. rather than being made up of points (or real 
numbers) with some previous Platonic existence, it was 
more genuinely “continuous.” Interestingly, this view 
is reminiscent of Aristotle, who, twenty-three centuries 
earlier, had emphasized the priority of the continuum 
and rejected the idea that an extended continuum can 
be made up of unextended points. 

The next stage in Brouwer’s redevelopment of analy- 
sis was to analyze the idea of a function. Brouwer 
deftned a function to be an assignment of values to the 
elements of a spread. However, because of the nature of 
spreads, this assignment had to be wholly dependent 
on an initial segment of the choice sequence in order 
to be constructively admissible. This threw up a big 
surprise: all functions that are everywhere defined are 
continuous (and even uniformly continuous). What, you 
might wonder, about the function / where f(x) = 0 
when x < 0 and fix) = 1 when x > 0? For Brouwer, 
this is not a well-defined function, and the underlying 
reason for this is that one can determine spreads for 
which we do not know (and may never know) whether 
they are positive, zero, or negative. For instance, one 
could let x n be 1 if all the even numbers between 4 and 
2 n are sums of two primes, and - 1 otherwise. 

The rejection of PEM has the effect that intuitionis- 
tic negation differs in meaning from classical negation. 
Thus, intuitionistic arithmetic is also different from 
classical arithmetic. Nevertheless, in 1933 Godel and 
Gentzen were able to show that the dedekind-peano 
axioms [III.69] of arithmetic are consistent relative to 
formalized intuitionistic arithmetic. (That is, they were 
able to establish a correspondence between the sen- 
tences of both formal systems, such that a contradic- 
tion in classical arithmetic yields a contradiction in its 
intuitionistic counterpart; thus, if the latter is consis- 
tent, the former must be as well.) This was a small tri- 
umph for the Hilbertians, though corresponding proofs 
for systems of analysis or set theory have never been 
found. 


on this and other points. One can picture a spread as a subtree of the 
universal tree of natural numbers (consisting of all ftnite sequences of 
natural numbers), together with an assignment of previously available 
mathematical objects to the nodes. One law of the spread determines 
nodes In the tree, the other maps them to objects. 


Initially there had been hopes that the develop- 
ment of intuitionism would end in a simple and ele- 
gant presentation of pure mathematics. However, as 
Brouwer’s reconstruction developed in the 1920s, it 
became more and more clear that intuitionistic analysis 
was extremely complicated and foreign. Brouwer was 
not worried, for, as he would say in 1933, “the spheres 
of truth are less transparent than those of illusion.” But 
Weyl, although convinced that Brouwer had delineated 
the domain of mathematical intuition in a completely 
satisfactory way, remarked in 1925: “the mathemati- 
cian watches with pain the largest part of his tower- 
ing theories dissolve into mist before his eyes.” Weyl 
seems to have abandoned intuitionism shortly there- 
after. Fortunately, there was an alternative approach 
that suggested another way of rehabilitating classical 
mathematics. 

3.2 Hilbert’s Program 

This alternative approach was, of course, Hilbert’s pro- 
gram, which promised, in the memorable phrasing of 
1928, “to eliminate from the world once and for all the 
skeptical doubts” as to the acceptability of the classical 
theories of mathematics. The new perspective, which 
he started to develop in 1904, relied heavily on formal 
logic and a combinatorial study of the formulas that are 
provable from given formulas (the axioms). With mod- 
ern logic, proofs are turned into formal computations 
that can be checked mechanically, so that the process 
is purely constructivistic. 

In the light of our previous discussion (section 1), 
it is interesting that the new project was to employ 
Kroneckerian means for a justification of modern, anti- 
Kroneckerian methodology. Hilbert’s aim was to show 
that it is impossible to prove a contradictory formula 
from the axioms. Once this had been shown combina- 
torially or constructively (or, as Hilbert also said, fini- 
tarily), the argument can be regarded as a justification 
of the axiom system— even if we read the axioms as 
talking about non-Kroneckerian objects like the real 
numbers or transfinite sets. 

Still, Hilbert’s ideas at the time were marred by a defl- 
cient understanding of logical theory. 7 It was only in 
1917-18 that Hilbert returned to this topic, now with 
a reftned understanding of logical theory and a greater 
awareness of the considerable technical difficulties of 


7. The logic he presented in 1905 lagged far behind Frege’s system 
of 1879 or Peano’s of the 1890s. We do not enter into the development 
of logical theory in this period (see, for example, Moore 1998). 
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his project. Other mathematicians played very signif- 
icant parts in promoting this better understanding. 
By 1921, helped by his assistant Bernays, Hilbert had 
arrived at a very refined conception of the formaliza- 
tion of mathematics, and had perceived the need for a 
deeper and more careful probing into the logical struc- 
ture of mathematical proofs and theories. His program 
was first clearly formulated in a talk at Leipzig late in 
1922. 

Here we will describe the mature form of Hilbert’s 
program, as it was presented for instance in the 1925 
paper “On the infmite” (see van Heijenoort 1967). The 
main goal was to establish, by means of syntactic con- 
sistency proofs, the logical acceptability of the princi- 
ples and modes of inference of modern mathematics. 
Axiomatics, logic, and formalization made it possible 
to study mathematical theories from a purely mathe- 
matical standpoint (hence the name metamathematics), 
and Hilbert hoped to establish the consistency of the 
theories by employing very weak means. In particular, 
Hilbert hoped to answer all of the criticisms of Weyl 
and Brouwer, and thereby justify set theory, the clas- 
sical theory of real numbers, classical analysis, and of 
course classical logic with its PEM (the basis for indirect 
proofs by reductio ad absurdum). 

The whole point of Hilbert’s approach was to make 
mathematical theories fully precise, so that it would 
become possible to obtain precise results about their 
properties. The following steps are indispensable for 
the completion of such a program. 

(i) Finding suitable axioms and primitive concepts for 
a mathematical theory T, such as that of the real 
numbers. 

(ii) Finding axioms and inference rules for classical 
logic, which makes the passage from given propo- 
sitions to new propositions a purely syntactic, 
formal procedure. 

(ni) Formalizing T by means of the formal logical cal- 
culus, so that propositions of T are just strings of 
symbols, and proofs are sequences of such strings 
that obey the formal rules of inference. 

(iv) A finitary study of the formalized proofs of T that 
shows that it is impossible for a string of symbols 
that expresses a contradiction to be the last line 
of a proof. 

In faet, steps (ii) and (iii) can be solved with rather sim- 
ple systems formalized in first-order logic, like those 
studied in any introduction to mathematical logic, such 
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as Dedekind-Peano arithmetic or Zermelo-Fraenkel set 
theory. It turns out that first-order logic is enough for 
codifying mathematical proofs, but, interestingly, this 
realization came rather late — after godel's theorems 
[V.18], 

Hilbert’s main insight was that, when theories are 
formalized, any proof becomes a finite combinatorial 
object: it is just an array of strings of symbols com- 
plying with the formal rules of the system. As Bernays 
said, this was like “projecting” the deductive structure 
of a theory T into the number-theoretic domam, and it 
became possible to express in this domain the consis- 
tency of T. These realizations raised hopes that a fmi- 
tary study of formalized proofs would suffice to estab- 
lish the consistency of the theory, that is, to prove the 
sentence expressing the consistency of T. But this hope, 
not warranted by the previous insights, turned out to 
be wrong. 8 

Also, a crucial presupposition of the program was 
that not only the logical calculus but also each of the 
axiomatic systems would be complete. Roughly speak- 
ing, this means that they would be sufficiently power- 
ful to allow the derivation of all the relevant results. 9 
This assumption turned out to be wrong for systems 
that contain (primitive recursive) arithmetic, as Godel 
showed. 

It remains to say a bit more about what Hilbert meant 
by finitism (for details, see Tait 1981). This is one of 
the points in which his program of the 1920s adopted 
to some extent the principles of intuitionists such as 
Poincaré and Brouwer and deviated strongly from the 
ideas Hilbert himself had considered in 1900. The key 
idea is that, contrary to the views of logicists like 
Frege and Dedekind, logic and pure thought require 
something that is given “intuitively” in our immediate 
experience: the signs and formulas. 

In 1905, Poincaré had put forth the view that a for- 
mal consistency proof for arithmetic would be circu- 
lar, as such a demonstration would have to proceed by 
induction on the length of formulas and proofs, and 
thus would rely on the same axiom of induction that it 
was supposed to establish. Hilbert replied in the 1920s 
that the form of induction required at the metamath- 
ematical level is mueh weaker than full arithmetical 
induction, and that this weak form is grounded on the 


8. For further details, see, for example, Sieg (1999). 

9. The notion of “relevant result” should of course be made precise: 
doing so leads to the notion either of syntactic completeness or of 
semantic completeness. 
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finitary consideration of signs that he took to be intu- 
itively given. Finitary mathematics was not in need of 
any further justification or reduction. 

Hilbert’s program proceeded gradually by studying 
weak theories at first and proceeding to progressively 
stronger ones. The metatheory of a formal system stud- 
ies properties such as consistency, completeness, and 
some others (“completeness” in the logical sense means 
that all true or valid formulas that can be represented 
in the calculus are formally deducible in it). Proposi- 
tional logic was quickly proved to be consistent and 
complete. First-order logic, also known as predicate 
logic, was proved complete by Godel in his dissertation 
of 1929. For all of the 1920s, the attention of Hilbert 
and coworkers was set on elementary arithmetic and its 
subsystems; once this had been settled, the project was 
to move on to the much more difficult, but crucial, cases 
of the theory of real numbers and set theory. Acker- 
mann and von Neumann were able to establish consis- 
tency results for certain subsystems of arithmetic, but 
between 1928 and 1930 Hilbert was convinced that the 
consistency of arithmetic had already been established. 
Then came the severe blow of Gddel’s incompleteness 
results (see section 4). 

The name “formalism,” as a description of this pro- 
gram, came from the faet that Hilbert’s method con- 
sisted in formalizing each mathematical theory, and 
formally studying its proof structure. However, this 
name is rather one-sided and even confusing, espe- 
cially because it is usually contrasted with intuition- 
ism, a full-blown philosophy of mathematics. Iike most 
mathematicians, Hilbert never viewed mathematics as 
a mere game played with formulas. Indeed, he often 
emphasized the meaningfulness of (informal) mathe- 
matical statements and the depth of conceptual con- 
tent expressed in them. 10 

3.3 Personal Disputes 

The crisis was unfolding not just at an intellectual level 
but also at a personal level. One should perhaps tell 
this story as a tragedy, in which the personalities of 
the main figures and the successive events made the 
final result quite inescapable. 

Hilbert and Brouwer were very different personali- 
ties, though they were both extremely willful and elever 
men. Brouwer’s worldview was idealistic and tended 


10. This is very explicit, for example, in the leetures of 1919-20 
edited by Rowe (1992), and also in the 1930 paper that bears exaetly 
the same title (see Gesammelte Abhandlungen, volume 3). 


to solipsism. He had an artistic temperament and an 
eccentric private life. He despised the modern world, 
looking to the inner life of the self as the only way 
out (at least in principle, though not always in prac- 
tice). He preferred to work in isolation, al though he had 
good friends in the mathematical community, espe- 
cially in the international group of topologists that 
gathered around him. Hilbert was typically modernist 
in his views and attitudes; full of optimism and ratio- 
nalism, he was ready to lead his university, his country, 
and the international community into a new world. He 
was very much in favor of collaboration, and felt happy 
to join Klein’s schemes for institutional development 
and power. 

As a consequence of World War I, Germans in the 
early 1920s were not allowed to attend the Interna- 
tional Congresses of Mathematicians. When the oppor- 
tunity finally arose in 1928, Hilbert was eager to seize 
on it, but Brouwer was furious because of restrictions 
that were still imposed on the German delegation and 
sent a circular letter in order to convince other math- 
ematicians. Their viewpoints were widely known and 
led to a clash between the two men. On another level, 
Hilbert had made important concessions to his oppo- 
nents in the 1920s, hoping that he would succeed in 
his project of finding a consistency proof. Brouwer 
emphasized these concessions, accusing him of fail- 
ing to recognize authorship, and demanded new con- 
cessions. 11 Flilbert must have felt insulted and per- 
haps even threatened by a man whom he regarded 
as perhaps the greatest mathematician of the younger 
generation. 

The last straw came with an episode in 1928. Brouwer 
had since 1915 been a member of the editorial board 
of Mathematische Annalen, the most prestigious math- 
ematics journal at the time, of which Hilbert had been 
the main editor since 1902. 111 with “pernicious ane- 
mia,” and apparently thinking that he was close to the 
end, Hilbert feared for the future of his journal and 
decided it was imperative to remove Brouwer from the 
editorial board. When he wrote to other members of the 
board explaining his scheme, which he was already car- 
rying out, Einstein replied saying that his proposal was 
unwise and that he wanted to have nothing to do with 
it. Other members, however, did not wish to upset the 
old and admired Hilbert. Finally, a dubious procedure 
was adopted, where the whole board was dissolved and 
created anew. Brouwer was greatly disturbed by this 


11. See his “Intuitionistic reflections on formalism” of 1928 (In 
Mancosu 1998). 



II. 7. The Crisis in the Foundations of Mathematics 

action, and as a result of it the journal lost Einstein and 
Carathéodory, who had previously been main editors 
(see van Dalen 2005). 

After that, Brouwer ceased to publish for some years, 
leaving some book plans unfinished. With his disap- 
pearance from the scene, and with the gradual disap- 
pearance of previous political turbulences, the feelings 
of “crisis” began to fade away (see Hesseling 2003). 
Hilbert did not intervene much in the subsequent 
debates and foundational developments. 

4 Godel and the Aftermath 

It was not only the Annalen war that Hilbert won: the 
mathematical community as a whole continued to work 
in the style of modern mathematics. And yet his pro- 
gram suffered a profound blow with the publication 
of Godel’s famous 1931 article in the Monatshefte fur 
Mathematik und Physik. An extremely ingenious devel- 
opment of metamathematical methods— the arithme- 
tization of metamathematics— allowed Godel to prove 
that systems like axiomatic set theory or Dedekind- 
Peano arithmetic are incomplete (see godel’s theorem 
[V.18]). That is, there exist propositions P formulated 
strictly in the language of the system such that neither 
P nor ->P is formally provable in the system. 

This theorem already presented a deep problem for 
Hilbert’s endeavor, as it shows that formal proof cannot 
even capture arithmetical truth. But there was more. 
A close look at Gddel’s arguments made it clear that 
this first metamathematical proof could itself be for- 
malized, which led to “Godel’s second theorem”— that 
it is impossible to establish the consistency of the sys- 
tems mentioned above by any proof that can be codi- 
fied within them. Godel’s arithmetization of metamath- 
ematics makes it possible to build a sentence, in the 
language of formal arithmetic, that expresses the con- 
sistency of this same formal system. And this sentence 
turns out to be among those that are unprovable. 12 
To express it contrapositively, a fmitary formal proof 
(codifiable in the system of formal arithmetic) of the 
impossibility of proving 1 = 0 could be transformed 
into a contradiction of the system! Thus, if the sys- 
tem is indeed consistent (as most mathematicians are 
convinced it is), then there is no such fmitary proof. 

According to what Godel called at the time “the 
von Neumann conjecture” (namely, that if there is a 


12. For further details, see, for example, Smullyan (2001), van Hei- 
jenoort (1967), and good introductions to mathematical logic. Both 
theorems were carefully proved in Hilbert and Bemays (1934/39). Bad 
expositions and faulty interpretations of Godel’s results abound. 


155 

fmitary proof of consistency, then it can be formalized 
and codified within elementary arithmetic), the second 
theorem implies the failure of Hilbert’s program (see 
Mancosu (1999, p. 38) and, for more on the reception, 
Dawson (1997, pp. 68 ff)). One should emphasize that 
Godel’s negative results are purely constructivistic and 
even finitistic, valid for all parties in the foundational 
debate. They were difficult to digest, but in the end 
they led to a reestablishment of the basic terms for 
foundational studies. 

Mathematical logic and foundational studies con- 
tinued to develop brilliantly with Gentzen-style proof 
theory, with the rise of model theory [IV.2], etc.— all 
of which had their roots in the foundational studies of 
the first third of the twentieth century. Although the 
Zermelo-Fraenkel axioms suffice for giving a rigorous 
foundation to most of today’s mathematics, and have a 
rather convincing intuitive justification in terms of the 
“iterative” conception of sets, 13 there is a general feel- 
ing that foundational studies, instead of achieving their 
ambitious goal, “found themselves attracted into the 
whirl of mathematical activity, and are now enjoying 
full voting rights in the mathematical senate.” 14 

However, this impression is somewhat superficial. 
Proof theory has developed, leading to noteworthy 
reductions of classical theories to systems that can be 
regarded as constructive. A striking example is that 
analysis can be formalized in conservative extensions 
of arithmetic: that is, in systems that extend the lan- 
guage of arithmetic while including all theorems of 
arithmetic, but which are “conservative” in the sense 
that they have no new consequences in the language of 
arithmetic. Some parts of analysis can even be devel- 
oped in conservative extensions of primitive recursive 
arithmetic (see Feferman 1998). This raises questions 
about the philosophical bases on which the admissibil- 
ity of the relevant constructive theories canbe founded. 
But for these systems the question is far less simple 
than it was for Hilbert’s fmitary mathematics; it seems 
fair to say that no general consensus has yet been 
reached. 

Whatever its roots and justification may be, mathe- 
matics is a human activity. This truism is clear from the 


13. The basic Idea Is to view the set-theoretic universe as a product 
of iterating the following operation: one starts with a basic domain 
Vo (possibly finlte or even equal to 0) and forms all possible sets of 
elements In the domain; this gives a new domain Vi , and one iterates 
formlng sets of Vo u V \ , and so on (to infinity and beyond!). This pro- 
duces an open-ended set-theoretic universe, masterfully described by 
Zermelo (1930). On the iterative conception, see, for example, the last 
papers in Bernacerraf and Putnam (1983). 

14. To use the words of Gian-Carlo Rota In an 


l essay of 1973. 
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subsequent development of our story. The mathemati- 
cal community refused to abandon “classical” ideas and 
methods; the constructivist “revolution” was aborted. 
In spite of its failure, formalism established itself in 
practice as the avowed ideology of twentieth-century 
mathematicians. Some have remarked that formalism 
was less a real faith than a Sunday refuge for those who 
spent their weekdays working on mathematical objects 
as something very real. The Platonism of working days 
was only abandoned, as a bourbaki [VI.96] member 
said, when a ready-made reply was needed to unwel- 
come philosophical questions concerning mathemati- 
cal knowledge. 

One should note that formalism suited very well the 
needs of a self-conscious, autonomous community of 
research mathematicians. It granted them full freedom 
to choose their topics and to employ modern meth- 
ods to explore them. However, to reflective mathemat- 
ical minds it has long been clear that it is not the 
answer. Epistemological questions about mathematical 
knowledge have not been “eliminated from the world”; 
philosophers, historians, cognitive scientists, and oth- 
ers keep looking for more adequate ways of under- 
standing its content and development. Needless to say, 
this does not threaten the autonomy of mathematical 
researchers— if autonomy is to be a concern, perhaps 
we should worry instead about the pressures exerted 
on us by the market and other powers. 

Both (semi-)constructivism and modern mathemat- 
ics have continued to develop: the contrast between 
them has simply been Consolidated, though in a very 
unbalanced way, since some 99% of practicing mathe- 
maticians are “modern.” (But do statistics matter when 
it comes to the correct methods for mathematics?) In 
1905, commenting on the French debate, hadamard 
[VI.65] wrote that “there are two conceptions of math- 
ematics, two mentalities, in evidence.” It has now come 
to be recognized that there is value in both approaches: 
they complement each other and can coexist peacefully. 
In particular, interest in effective methods, algorithms, 
and computational mathematics has grown markedly 
in recent decades— and all of these are doser to the 
constructivist tradition. 

The foundational debate left a rich legacy of ideas 
and results, key insights and developments, including 
the formulation of axiomatic set theories and the rise 
of intuitionism. One of the most important of these 
developments was the emergence of modern mathe- 
matical logic as a refinement of axiomatics, which led to 
the theories of recursion and computability in around 


1936 (see algorithms [II.4 §3.2]). In the process, our 
understanding of the characteristics, possibilities, and 
limitations of formal systems was hugely clarified. 

One of the hottest issues throughout the whole 
debate, and probably its main source, was the question 
of how to understand the continuum. The reader may 
recall the contrast between the set-theoretic under- 
standing of the real numbers and Brouwer’s approach, 
which rejected the idea that the continuum is “built 
of” points. That this is a labyrinthine question was 
further established by results on Cantor’s continuum 
hypothesis (CH), which postulates that the cardinality 
of the set of real numbers is Ni, the second transfi- 
nite Cardinal, or equivalently that every infinite subset 
of R must biject with either N or with R itself. Godel 
proved in 1939 that CH is consistent with axiomatic 
set theory, but Paul Cohen proved in 1963 that it 
is independent of its axioms (i.e., Cohen proved that 
the negation of CH is consistent with axiomatic set 
theory [IV. 1 §5]). The problem is still alive, with a few 
mathematicians proposing alternative approaches to 
the continuum and others trying to find new and con- 
vincing set-theoretic principles that will settie Cantor’s 
question (see Woodin 2001). 

The foundational debate has also contributed in 
a definitive way to clarifying the peculiar style and 
methodology of modern mathematics, especially the 
so-called Platonism or existential character of modern 
mathematics (see the classic 1935 paper of Bernays 
in Benacerraf and Putnam (1983)), by which is meant 
(here at least) a methodological trait rather than any 
supposed implications of metaphysical existence. Mod- 
ern mathematics investigates structures by consider- 
ing their elements as given independently of human 
(or mechanical) capabilities of effective definition and 
construction. This may seem surprising, but perhaps 
this trait can be explained by broader characteristics of 
scientific thought and the role played by mathematical 
structures in the modeling of scientific phenomena. 

In the end, the debate made it clear that mathematics 
and its modern methods are still surrounded by impor- 
tant philosophical problems. When a sizable amount of 
mathematical knowledge can be taken for granted, the- 
orems can be established and problems can be solved 
with the certainty and clarity for which mathematics is 
celebrated. But when it comes to laying out the bare 
beginnings, philosophical issues cannot be avoided. 
The reader of these pages may have felt this at several 
places, especially in the discussion of intuitionism, but 
also in the basic ideas behind Hilbert’s program, and 



of course in the problem of the relationship between 
formal mathematics and its informal counterpart, a 
problem that is brought into Sharp focus by Godel’s 
theorems. 

I thank Mark van Atten, Jeremy Gray, Paolo Mancosu, 
José F. Ruiz, Wilfried Sieg, and the editors for their helpful 
comments on a previous version of this paper. 
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Part III 

Mathematical Concepts 


III. 1 The Axiom of Choice 


Consider the following problem: it is easy to find two 
irrational numbers a and b such that a + b is rational, 
or such that ab is rational (in both cases one could take 
a = ~J2 and b = -y'2), but is it possible for a b to be 
rational? Here is an elegant proof that the answer is yes. 
Let x = V2 v . If x is rational then we have our example. 
But x^ = V2 = 2 is rational, so if x is irrational then 
again we have an example. 

Now this argument certainly establishes that it is 
possible for a and b to be irrational and for a b to 
be rational. However, the proof has a very interesting 
feature: it is nonconstructive, in the sense that it does 
not actually name two irrationals a and b that work. 
Instead, it tells us that either we can take a = b = J2 
or we can take a = -/2' /2 and b = |jÉ Not only does it 
not tell us which of these alternatives will work, it gives 
us absolutely no clue about how to find out. 

Some philosophers and philosophically inclined 
mathematicians have been troubled by arguments of 
this kind, but as far as mainstream mathematics goes 
they are a fully accepted and important style of rea- 
soning. Formally, we have appealed to the “law of the 
excluded middle.” We have shown that the negation of 
the statement cannot be true, and deduced that the 
statement itself must be true. A typical reaction to 
the proof above is not that it is in any sense invalid, 
but merely that its nonconstructive nature is rather 
surprising. 

Nevertheless, faced with a nonconstructive proof, it 
is very natural to ask whether there is a constructive 
proof. After all, an actual construction may give us 
more insight into the statement, which is an impor- 
tant point since we prove things not only to be sure 
they are true but also to gain an idea of why they are 
true. Of course, to ask whether there is a constructive 
proof is not to suggest that the nonconstructive proof 


is invalid, but just that it may be more informative to 
have a constructive proof. 

The axiom of choice is one of several rules that we 
use for budding sets out of other sets. Typical exam- 
ples of such rules are the statement that for any set A 
we can form the set of all its subsets, and the statement 
that for any set A and any property p we can form the 
set of all elements of A that satisfy p (these are usually 
caded the power-set axiom and the axiom of compre- 
hension, respectively). Roughly speaking, the axiom of 
choice says that we are allowed to make an arbitrary 
number of unspecified choices when we wish to form a 
set. 

like the other axioms, the axiom of choice can seem 
so natural that one may not even notice that one is 
using it, and indeed it was applied by many mathe- 
maticians before it was first formalized. To get an idea 
of what it means, let us look at the wed-known proof 
that the union of a countable famdy of countable sets is 
countable. The faet that the family is countable allows 
us to write out the sets in a list Ai , A2 , A3 , . . . , and then 
the faet that each individual set A n is countable adows 

us to write its elements in a Ust a n \ , a„2,a„3 We 

then finish the proof by finding some systematic way 
of counting through the elements a nm . 

Now in that proof we actually made an infmite num- 
ber of unspecified choices. We were told that each A n 
was countable and then for each A n we “chose” a listing 
of the elements of A n without specifying the choice we 
had made. Moreover, since we are told absolutely noth- 
ing about the sets A n , it is clearly impossible to say how 
we choose to list them. This remark does not invalidate 
the proof, but it does show that it is nonconstructive. 
(Note, however, that if we are actually told what the sets 
A n are, then we may wed be able to specify listings of 
their elements and thereby give a constructive proof 
that the union of those particular sets is countable.) 

Here is another example. A graph [III.34] is bipartite 
if its vertices can be split into two classes X and Y in 
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such a way that no two vertices in the same class are 
connected by an edge. For example, any even cycle (an 
even number of points arranged in a circle, with consec- 
utive points joined) is bipartite, while no odd cycle is. 
Now, is an inflnite disjoint union of even cycles bipar- 
tite? Of course it is: we just split each of the individual 
cycles C into two classes Xc and Yc and then let X be 
the union of the sets Xc and Y be the union of the sets 
Yc . But how do we choose for each cycle C which set to 
call Xc and which to call Yc? Again, we cannot actually 
specify how we do this, so we are using the axiom of 
choice (even if we do not explicitly say so). 

In general, the axiom of choice States that if we are 
given a family of nonempty sets Xi, then we may select 
an element Xi from each one at once. More precisely, it 
States that if the X, are nonempty sets, where i ranges 
over some index set I, then there is a function / defined 
on I such that /(i) e Xi for all i. Such a function / is 
called a choice function for the family. 

For one set we do not need any separate rule to do 
this: indeed, the statement that a set Xi is nonempty is 
exactly the statement that there exists x\ e X] . (More 
formally, we might say that the function / that takes 1 
to X\ is a choice function for the “family” that consists 
of the single set Aj.) For two sets, and indeed for any 
finite collection of sets, one can prove the existence of 
a choice function by induction on the number of sets. 
But for infinitely many sets it turns out that one can- 
not deduce the existence of a choice function from the 
other rules for budding sets. 

Why do people make a fuss about the axiom of 
choice? The main reason is that if it is used in a proof, 
then that part of the proof is automatically noncon- 
structive. This is reflected in the very statement of the 
axiom. For the other rules that we use, such as “one 
may take the union of two sets,” the set whose exis- 
tence is being asserted is uniquely defined by its prop- 
erties (u is an element of X u Y if and only if it is an 
element of X or of Y or of both). But this is not the case 
with the axiom of choice: the object whose existence is 
asserted (a choice function) is not uniquely specified by 
its properties, and there wdl typically be many choice 
functions. 

For this reason, the general view in mainstream math- 
ematics is that, although there is nothing wrong with 
using the axiom of choice, it is a good idea to signal 
that one has used it, to draw attention to the faet that 
one’s proof is not constructive. 

An example of a statement whose proof involves 
the axiom of choice is the banach-tarski Paradox 


[V.3]. This says that there is a way of dividing up 
a solid unit sphere into a finite number of subsets 
and then reassembling these subsets (using rotations, 
reflections, and translations) to form two solid unit 
spheres. The proof does not provide an explicit way 
of defuiing the subsets. 

It is sometimes claimed that the axiom of choice 
has “undesirable” or “highly counterintuitive” conse- 
quences, but in almost ad cases a little thought reveals 
that the consequence under consideration is actu- 
ally not counterintuitive at ad. For example, consider 
the Banach-Tarski paradox above. Why does it seem 
strange and paradoxical? It is because we feel that vol- 
ume has not been preserved. And indeed, this feeling 
can be converted into a rigorous argument that the sub- 
sets used in the decomposition cannot all be sets to 
which one can meaningfully assign a volume. But that 
is not a paradox at ad: we can say what we mean by the 
volume of a nice set such as a polyhedron, but there is 
no reason to suppose that we can give a sensible defini- 
tion of volume for all subsets of the sphere. (The sub- 
ject called measure theory can be used to give a volume 
to a very wide class of sets, called the measurable sets 
[III. 5 7], but there is no reason at ad to bedeve that ad 
sets should be measurable, and indeed it can be shown, 
again by a use of the axiom of choice, that there are sets 
that are not measurable.) 

There are two forms of the axiom of choice that are 
more often used in dady mathematical life than the 
basic form we have been discussing. One is the well- 
ordering principle, which States that every set can be 
well-ordered [III.68]. The other is Zorn's lemma, which 
States that under certain circumstances “maximal” ele- 
ments exist. For example, a basis for a vector space 
is precisely a maximal linearly independent set, and it 
turns out that Zorn’s lemma applies to the collection 
of linearly independent sets in a vector space, which 
shows that every vector space has a basis. 

These two statements are called forms of the axiom 
of choice because they are equivalent to it, in the sense 
that each one both implies the axiom of choice and may 
be deduced from it, in the presence of the other rules 
for budding sets. A good way of seeing why these two 
other forms of the axiom have a nonconstructive feel 
to them is to spend a few minutes trying to find a well- 
ordering of the reals, or a basis for the vector space of 
ad sequences of real numbers. 

For more about the axiom of choice, and especially 
about its relationship to the other axioms of formal set 
theory, see set theory [IV. 1], 



III. 3. Bayesian Analysis 


161 


III.2 The Axiom of Determinacy 


Consider the following “infinite game.” Two players, 
A and B, take turns to name natural numbers, with A 
going first, say. By doing this, they generate an infinite 
sequence. A wins the game if this sequence is “eventu- 
ally periodic,” and B wins if it is not. (An eventually peri- 
odic sequence is one like 1, 56, 4, 5, 8, 3, 5, 8, 3, 5, 8, 3, 5, 
8, 3, ... , which setties down after a while to a recurring 
pattern.) It is not hard to see that B has a winning strat- 
egy for this game, since eventually periodic sequences 
are rather special. However, there is never a point in 
the game at which B is guaranteed to win, since every 
finite sequence could be the beginning of an eventually 
periodic sequence. 

More generally, any collection S of infinite sequences 
of natural numbers gives rise to an infmite game: A’s 
object is now to ensure that the sequence produced is 
one of the sequences in S, and B’s object is to ensure 
the reverse. The resulting game is called determined if 
one of the two players has a winning strategy. As we 
have seen, the game is certainly determined when 5 is 
the set of eventually periodic sequences, and indeed for 
just about any set 5 that one writes down it is easy to 
see that the corresponding game is determined. Never- 
theless, it turns out that there are games that are not 
determined. (It is an instructive exercise to see where 
the plausible-seeming argument, “If A does not have a 
winning strategy, then A cannot force a win, so B must 
have a winning strategy,” breaks down.) 

It is not too hard to construct nondetermined games, 
but the constructions use the axiom of choice [III.l]: 
roughly speaking, one can well-order all possible strate- 
gies so that each one has fewer predecessors than there 
are infinite sequences, and select sequences to belong 
to S or its complement in a way that stops each strategy 
in turn from being a winning strategy for either player. 

The axiom of determinacy States that all games are 
determined. It contradicts the axiom of choice, but it 
is a rather interesting axiom when it is added to the 
ZERMELO-FRAENKEL AXIOMS [IH. 101] without Choice. It 
turns out, for example, to imply that many sets of 
reals have surprisingly good properties, such as being 
Lebesgue measurable. Variants of the axiom of deter- 
minacy are closely connected with the theory of large 
cardinals. For more details, see set theory [IV. 1 ] . 


Banach Spaces 

See NORMED SPACES AND BANACH SPACES 
[III. 64] 


III. 3 Bayesian Analysis 


Suppose you throw a pair of standard dice. The proba- 
bility that the total is 10 is ^ because there are thirty- 
six ways the dice can come up, of which three (4 and 
6, 5 and 5, and 6 and 4) give 10. If, however, you look 
at the first die and see that it came up as a 6, then the 
conditional probability that the total is 10, given this 
information, is g (since that is the probability that the 
other die comes up as a 4). 

In general, the probability of A given B is dehned to 
be the probability of A and B divided by the probability 
of B. In symbols, one writes 

_ P[A aB] 

P[B] ' 

From this it follows that P[A a B] = P[A|£] P[B], Now 
P[A a BJ is the same as P[B a A]. Therefore, 


P[A|B] = - 


P[A|B]P[B] =P[B|A]P[A], 

since the left-hand side is P[A a B] and the right-hand 
side is P[B a A]. Dividing through by P[B J we obtain 
Bayes’s theorem: 

_ P[B|A]P[A] 


P[A|B] = - 


P[B] 


which expresses the conditional probability of A given 
B in terms of the conditional probability of B given A. 

A fundamental problem in statistics is to analyze ran- 
dom data given by an unknown probability distribu- 
tion [III. 73]. Here, Bayes’s theorem can make a signif- 
icant contribution. For example, suppose you are told 
that an unknown number of unbiased coins between 1 
and 10 have been tossed, and that three of them came 
up heads. Suppose that you wish to guess how many 
coins there were. Let H 3 stand for the event that three 
coins came up heads and let C be the number of coins. 
Then for each n between 1 and 10 it is not hard to cal- 
culate the conditional probability P[ff 3 1 C = n], but we 
would like to know the reverse, namely P[C = n\ H 3 \. 
Bayes's theorem tells us that it is 

, n P [C = n] 

P[H 3 ] ■ 

This would tell us the ratios between the various con- 
ditional probabilities P[C = n | F/3 J if we knew what the 
probabilities P[C = n] were. Typically, one does not 


P[H 3 \C = n]~ 
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know this, but one makes some kind of guess, called a 
prior distribution. For example, one might guess, before 
knowing that three coins had come up heads, that 
for each n between 1 and 10 the probability that n 
coins had been chosen was ,'q . After this information, 
one would use the calculation above to revise one’s 
assessment and obtain a posterior distribution, in which 
the probability that C = n would be proportional to 
l 6 n.lh\C- ni 

There is more to Bayesian analysis than simply apply- 
ing Bayes’s theorem to replace prior distributions by 
posterior distributions. In particular, as in the exam- 
ple just given, there is not always an obvious prior 
distribution to take, and it is a subtle and interesting 
mathematical problem to devise methods for choosing 
prior distributions that are “optimal” in different ways. 
For further discussion, see mathematics and medi- 
CAL STATISTICS [VII.11] and MATHEMATICAL STATISTICS 

[vn.io]. 


III.4 Braid Groups 

F. E. A. Johnson 





Figure 1 Two 3-braids. 


Take two parallel planes, each punctured at n points. 
Label the holes 1 to n in each plane, and run a string 
from each hole in the first plane to one in the second, 
in such a way that no two strings go to the same hole. 
The result is an n-braid. Two different 3-braids, shown 
in two-dimensional projection in a similar manner to 
knot diagrams [III.46], are given in figure 1. 

As the diagrams suggest, we insist that the strings 
go from left to right without “doubling back”; so, for 
example, a knotted string is not allowed. 

In describing the “same” braid in different ways, a 
certain freedom is allowed. Subject to the restrictions 
that string ends remam fixed and that strings neither 
break nor pass through each other, strings are allowed 
to stretch, contract, bend, and otherwise move about in 
three dimensions. This notion of “sameness” is called 
braid isotopy. 

Braids may be composed as follows: arrange a pair of 
braids end to end to abut in a common (middle) plane, 
join up the strings, and remove the middle plane. For 
the braids X and Y in figure 1, the composition XY is 
given in figure 2. 

With this notion of composition, n-braids form a 
group B n . In our example, Y = X -1 , since ‘pulling all 
the strings tight” shows that XY is isotopic to the trivial 
braid (figure 3), which acts as the identity. 



1 1 

2 2 

3 3 

Figure 3 The trivial braid. 

As a group, B n is generated by elements , 

where Of is formed from the trivial braid by Crossing 
the ith string over the (i + l)st as in figure 4. The 
reader may perceive a similarity between the <t 1 and the 
adj acent transpositions that generate the group S n of 
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i + 2 i + 2 

"" ^ " 

Figure 4 The generator <x,. 

permutations [III. 70] of {1, . . . , n] . Indeed, any braid 
determines a permutation by the rule 

i ■— right-hand label of ith string. 

Ignoring everything except the behavior at the ends 
gives a surjective homomorphism -<■ S n , which 
maps CTj to the transposition (i, i + 1 ) . This is not an iso- 
morphism, however, as B n is infinite. In faet, 07 has inh- 
nite order, whereas the transposition (i, i + 1) squares 
to the identity. In his celebrated 1925 paper “Theorie 
der Zopfe,” artin [VI.86] showed that multiplication in 
B n is completely described by the relations 
c TiO-j = ajCri (\i - j\ ^ 2), 

CTjCTj+iCTj = <Ti+i(Ti(Ti+i. 

These relations have subsequently acquired impor- 
tance in statistical physics, where they are known as 
the Yang-Baxter equations. 

In groups defined by generators and relations it is 
usually difficult (there being no method which works 
uniformly in all cases) to decide whether an arbitrary 
word in the generators represents the identity element 
(see GEOMETRIC AND COMBINATORIAL GROUP THEORY 
[IV.ll]). For B n , Artin solved this problem geometri- 
cally, by “combing the braid.” An alternative algebraic 
method, due to Garside (1967), also decides when two 
elements in B n are conjugate. 

In relation to the decidability of such questions, and 
in many other respects, braid groups display close 
afflnities with linear groups : that is, groups in which all 
elements behave as if they were invertible N x N matri- 
ces. Although such similarities suggested that it should 
be possible to prove that braid groups genuinely are 
linear, the problem of doing so remained unsolved for 
many years, until in 2001 a proof was eventually found 
by Bigelow and independendy by Krammer. 


The groups described here are, strietly speaking, 
braid groups of the plane, the plane being the object 
punetured. Other braid groups also occur, often in 
surprising contexts. The connection with statistical 
physics has already been mentioned. They arise also 
in algebraic geometry, when algebraic curves become 
punetured by discarding exceptional points. Thus, 
though originating in topology, braids may intervene 
signihuandy in areas such as “constructive Galois 
theory” that seem at first sight to be purely algebraic. 


III. 5 Buildings 

Mark Ronan 


The invertible linear transformations on a vector space 
form a group, called the general linear group. If n is 
the dimension of the vector space and K is the held of 
scalars, then it is denoted by GL n (K), and if we pick 
a basis for the vector space, then each group element 
can be written as an n x n matrix whose determinant 
[III.15] is nonzero. This group and its subgroups are of 
great interest in mathematics, and can be studied “geo- 
metrically” in the following way. Instead of looking at 
the vector space V, where of course the origin plays a 
unique role and is fixed by the group, we use the pro- 
jective space [1.3 §6.7] associated with V: the points 
of projective space are the one-dimensional subspaces 
of V, the lines are the two-dimensional subspaces, the 
planes are the three-dimensional subspaces, and so on. 

Several important subgroups of GL n (K) can be 
obtained by imposing constraints on the linear maps 
(or matrices). For example, SL n (K) consists of all 
linear transformations of determinant 1. The group 
O(n) consists of all linear transformations « of an 
n-dimensional real inner-product space such that 
( av,cnv ) = (v,w) for any two vectors v and w (or 
in matrix terms all real matrices A such that AA J = 
I); more generally, one can define many similar sub- 
groups of GL n (£) by taking all linear maps that pre- 
serve certain forms, such as bilinear or sesquilinear 
forms. These subgroups are called classical groups. The 
classical groups are either simple or close to simple (for 
example, we can often make them simple by quotient- 
ing out by the subgroup of scalar matrices). When K 
is the held of real or complex numbers, the classical 
groups are Lie groups. 

Lie groups and their classihcation are discussed in 
lie theory [III.50]: the simple Lie groups comprise the 
classical groups, which fali into one of four families, 
known as A n , B n , C n , and D n (where n is a natur al 
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number), along with other types known as Ee, £7, Eg, 
F4, and G2. The subscripts are related to the dimen- 
sions of the groups. For example, the groups of type 
A n are the groups of invertible linear transformations 
in n + 1 dimensions. 

These simple Iie groups have analogues over any 
held, where they are often referred to as groups of 
Lie type. For example, K can be a finite held, in which 
case the groups are finite. It turns out that almost all 
finite simple groups are of Iie type: see the classi- 
fication of finite simple groups [V.8]. A geometric 
theory underlying the classical groups had been devel- 
oped by the first half of the twentieth century. It used 
projective space and various subgeometries of projec- 
tive space, which made it possible to provide analogues 
for the classical groups, but it did not provide ana- 
logues for the groups of types £6. E7, Es, £4, and G2. For 
this reason, Jacques Tits looked for a geometric theory 
that would embrace all families, and ended up creating 
the theory of buildings. 

The full abstract definition of a budding is some- 
what complicated, so instead we shall try to give some 
idea of the concept by looking at the budding associ- 
ated with the groups GL n (K) and SL n (£), which are of 
type A n - 1. This budding is an abstract simplicial com- 
plex, which can be thought of as a higher-dimensional 
analogue of a graph [III.34]. It consists of a collection 
of points called vertices ; as in a graph, some pairs of 
vertices form edges ; however, it is then possible for 
triples of vertices to form two-dimensional faces, and 
for sets of k vertices to form (k - 1) -dimensional “sim- 
plexes.” (The geometrical meaning of the word “sim- 
plex” is a convex hud of a finite set of points in gen- 
eral position: for distance, a three-dimensional simplex 
is a tetrahedron.) Ad faces of simplexes must also be 
included, so for example three vertices cannot form a 
two-dimensional face unless each pair is joined by an 
edge. 

To form the budding of type A n - 1, we start by tak- 
ing all the 1-spaces, 2-spaces, 3-spaces, and so on (cor- 
responding to points, lines, planes, and so on, in pro- 
jective space), and treat them as “vertices.” The sim- 
plexes are formed by all nested sequences of proper 
subspaces: for example, a 2-space inside a 4-space 
inside a 5-space wid form a “triangle” whose vertices 
are these three subspaces. The simplexes of maximal 
dimension have n - 1 vertices: a 1-space inside a 2- 
space inside a 3-space, and so on. These simplexes are 
caded chambers. 


There are many subspaces, so a budding is a huge 
object. However, buildings have important subgeome- 
tries called apartments, which in the A n ~\ case are 
obtained by taking a basis for the vector space, and 
then taking ad subspaces generated by subsets of this 
basis. For example, in the A3 case our vector space is 
four dimensional, so a basis has four elements; its sub- 
sets spån four 1-spaces, six 2-spaces, and four 3-spaces. 
To visuadze this apartment it helps to view the four 1- 
spaces as the vertices of a tetrahedron, the six 2-spaces 
as the midpoints of its edges, and the four 3-spaces as 
the midpoints of its faces. The apartment has twenty- 
four chambers, six for each face of the original tetra- 
hedron, and they form a triangular tding of the surface 
of the tetrahedron. This surface is topologicady equiv- 
alent to a sphere, as are ad apartments of this budding: 
such buildings are called spherical. The buddings for 
the groups of Lie type are all spherical, and, just as 
A3 is related to the tetrahedron, their apartments are 
related to the regular and semiregular polyhedra in n 
dimensions, where n is the subscript in the Lie notation 
given earlier. 

Buildings have the fodowing two noteworthy fea- 
tures. First, any two chambers lie in a common apart- 
ment: this is not obvious in the example above but it can 
be proved using linear algebra. Second, in any budding 
all apartments are isomorphic and any two apartments 
intersect nicely: more precisely, if A and A' are apart- 
ments, then A n A' is convex and there is an isomor- 
phism from A to A' that fixes An A' . These two features 
were originally used by Tits in deflning buildings. 

The theory of spherical buildings does not just give 
a pleasing geometric basis for the groups of Iie type: 
it can also be used to construct those of types Es, £7, 
Es, and £4, for an arbitrary held K, without the need for 
sophisticated machinery such as Lie algebras. Once the 
budding has been constructed (and a construction can 
be given in a surprisingly simple manner), a theorem of 
Tits on the existence of automorphisms shows that the 
groups themselves must exist. 

Jn a spherical budding the apartments are tdings of 
a sphere, but other types of buddings also play signih- 
cant roles. Of particular importance are affine buildings, 
in which the apartments are tdings of Eucddean space; 
such buildings arise in a natural way from groups, such 
as GL n (K), where K is a p-ADic field [III. 5 3]. For such 
helds there are two buildings, one spherical and one 
affine, but the afhne one carries more information and 
yields the spherical budding as a structure “at inhn- 
ity.” Going beyond afhne buddings, there are hyperbolic 
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buildings, whose apartments are tilings of hyperbolic 
space; they arise naturally in the study of hyperbolic 
Kac-Moody groups. 


III.6 Calabi-Yau Manifolds 

Eric Zaslow 


1 Basic Definition 

Calabi-Yau manifolds, named after Eugenio Calabi and 
Shing-Tung Yau, arise in Riemannian geometry and 
algebraic geometry, and play a prominent role in string 
theory and mirror symmetry. 

In order to explain what they are, we need flrst 
to recall the notion of orientability on a real mani - 
fold [1.3 §6.9]. Such a manifold is orientable if you 
can choose coordinate systems at each point in such 
a way that any two systems x = (x 1 , . . . ,x m ) and 
y = (y 1 ,.. . ,y m ) that are defined on overlapping sets 
give rise to a positive Jacobian: detOy'/øx-O > 0. The 
notion of a Calabi-Yau manifold is the natural com- 
plex analogue of this. Now the manifold is complex, 
and for each local coordinate system z = (z 1 , . . . , z n ) 
one has a holomorphic function [1.3 §5.6] /(z). It is 
vital that / should be nonvanishing: that is, it never 
takes the value 0. There is also a compatibility con- 
dition: if z(z) is another coordinate system, then the 
corresponding function / is related to / by the equa- 
tion / = f det(dz a /dz b ). Note that in this definition, 
if we replace all complex terms by real terms, then we 
have the notion of a real orientation. So a Calabi-Yau 
manifold can be thought of informally as a complex 
manifold with complex orientation. 

2 Complex Manifolds and Hermitian Structure 

Before we go any further, a few words about complex 
and Kahler geometry are in order. A complex manifold 
is a structure that looks locally like C n , in the sense 
that one can find complex coordinates z = (z 1 , . . . , z n ) 
near every point. Moreover, where two coordinate sys- 
tems z and z overlap, the coordinates z a are holomor- 
phic when they are regarded as functions of the z h . 
Thus the notion of a holomorphic function on a com- 
plex manifold makes sense and does not depend on the 
coordinates used to express the function. In this way, 
the local geometry of a complex manifold does indeed 
look like an open set in C n , and the tangent space at a 
point looks like C n itself. 


On complex vector spaces it is natural to consider 
Hermitian inner Products [III.37] represented by her- 
mitian matrices [III. 5 2 § 3 1 1 g a h withrespect to a basis 
e a - On complex manifolds, a Hermitian inner product 
on the tangent spaces is called a “Hermitian metric,” 
and is represented in a coordinate basis by a Hermitian 
matrix g a ^, which depends on position. 

3 Holonomy, and Calabi-Yau Manifolds 
in Riemannian Geometry 

On a riemannian manifold [1.3 §6.10] one can move a 
vector along a path so as to keep it of constant length 
and “always pointing in the same direction.” Curvature 
expresses the faet that the vector you wind up with at 
the end of the path depends on the path itself. When 
your path is a closed loop, the vector at the starting 
point comes back to a new vector at the same point. 
(A good example to think about is a path on a sphere 
that goes from the North Pole to the equator, then a 
quarter of the way around the equator, then back to 
the North Pole again. When the journey is completed, 
the “constant” vector that began by pointing south will 
have been rotated by 90° .) With each loop we asso- 
ciate a matrix operator, called the holonomy matrix, 
which sends the starting vector to the ending vector; 
the group generated by all of these matrices is called 
the holonomy group of the manifold. Since the length 
of the vector does not change during the process of 
keeping it constant along the loop, the holonomy matri- 
ces all lie in the orthogonal group of length-preserving 
matrices, O(m). If the manifold is oriented, then the 
holonomy group must lie in SO (m), as one can see by 
transporting an oriented basis of vectors around the 
loop. 

Every complex manifold of (complex) dimension n 
is also a real manifold of (real) dimension tn = 2 n, 
which one can think of as coordinatized by the real 
and imaginary parts of the complex coordinates z J . 
Real manifolds that arise in this way have additional 
structure. For example, the faet that we can multiply 
complex coordinate directions by i — \/-l implies 
that there must be an operator on the real tangent 
space that squares to = 1. This operator has eigenvalues 
±i, which can be thought of as “holomorphic” and 
“anti-holomorphic” directions. The Hermitian property 
States that these directions are orthogonal, and we say 
that the manifold is Kahler if they remam so after 


1. The notation g a f, indicates the conjugate-linear property of a 
Hermitian inner product. 
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transport around loops. This means that the holon- 
omy group is a subgroup of U(n) (which itself is a 
subgroup of SO (2 m): complex manifolds always have 
real orientations). There is a nice local characterization 
of the Kabler property: if g a ^ are the components of 
the Hermitian metric in some coordinate patch, then 
there exists a function cp on that patch such that g a f, = 
d 2 cp/dz a dz b . 

Given a complex orientation— that is, the nonmetric 
definition of a Calabi-Yau manifold given above— a 
compatible Kakler structure leads to a holonomy that 
lies in SU(n) c U(n), the natural analogue of the case 
of real orientation. This is the metric definition of a 
Calabi-Yau manifold. 

4 The Calabi Conjecture 

Calabi conjectured that, for any Kåhler manifold of 
complex dimension n and any complex orientation, 
there exists a function u and a new Kåhler metric g, 
given in coordinates by 

_ d 2 u 

3 ab = 3 ab + dz a d 2i>' 

that is compatible with the orientation. In equations, 
the compatibility condition States that 

det (^ + Sw) =ifi2 ’ 

where / is the holomorphic orientation function dis- 
cussed above. Thus, the metric notion of a Calabi-Yau 
manifold amounts to a formidable nonlinear partial dif- 
ferential equation for u. Calabi proved the uniqueness 
and Yau proved the existence of a solution to this equa- 
tion. So in faet the metric definition of a Calabi-Yau 
manifold is uniquely determined by its Kåhler structure 
and its complex orientation. 

Yau’s theorem establishes that the space of metrics 
with holonomy group SU(n) on a manifold with com- 
plex orientation is in correspondence with the space 
of inequivalent Kåhler structures. The latter space can 
easily be probed with the techniques of algebraic geom- 
etry. 

5 Calabi-Yau Manifolds in Physics 

Einstein’s theory of gravity, general relativity, con- 
structs equations that the metric of a Riemannian 
space-time manifold must obey (see general rela- 
tivity AND THE EINSTEIN EQUATIONS [IV.17]). The 
equations involve three symmetric tensors: the metric, 


the ricci curvature [III.80] tensor, and the energy- 
momentum tensor of matter. A Riemannian manifold 
whose Ricci tensor vanishes is a solution to these equa- 
tions when there is no matter, and is a special case 
of an Einstein manifold. A Calabi-Yau manifold with 
its unique SU (n) -holonomy metric has vanishing Ricci 
tensor, and is therefore of interest in general relativity. 

A fundamental problem in theoretical physics is the 
incorporation of Einstein’s theory into the quantum 
theory of particles. This enterprise is known as quan- 
tum gravity, and Calabi-Yau manifolds figure promi- 
nently in the leading theory of quantum gravity, string 
theory [IV. 13 §2]. 

In string theory, the fundamental objects are one- 
dimensional “strings.” The motion of the strings 
through space-time is described by two-dimensional 
trajectories, known as worldsheets, so every point 
on the worldsheet is labeled by the point in space- 
time where it sits. In this way, string theory is con- 
structed from a quantum held theory of maps from 
two-dimensional riemann surfaces [III.81] to a space- 
time manifold M. The two-dimensional surface should 
be given a Riemannian metric, and there is an infinite- 
dimensional space of such metrics to consider. This 
means that we must solve quantum gravity in two 
dimensions— a problem that, like its four-dimensional 
cousin, is too hard. If, however, it happens that the two- 
dimensional worldsheet theory is conformal (invari- 
ant under local changes of scale), then just a finite- 
dimensional space of conformally inequivalent metrics 
remains, and the theory is well-defined. 

The Calabi-Yau condition arises from these consid- 
erations. The requirement that the two-dimensional 
theory is conformal, so that the string theory makes 
good sense, is in essence the requirement that the 
Ricci tensor of space-time should vanish. Thus, a two- 
dimensional condition leads to a space-time equation, 
which turns out to be exaetly Einstein’s equation with- 
out matter. We add to this condition the “phenomeno- 
logical” criterion that the theory be endowed with 
“supersymmetry,” which requires the space-time man- 
ifold M to be complex. The two conditions together 
mean that M is a complex manifold with holonomy 
group SU(n): that is, a Calabi-Yau manifold. By Yau’s 
theorem, the choices of such M can easily be described 
by algebraic geometric methods. 

We remark that there is a kind of distillation of string 
theory called “topological strings,” which can be given 
a rigorous mathematical framework. Calabi-Yau mani- 
folds are both symplectic and complex, and this leads 
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to two versions of topological strings, called A and B, 
that one can associate with a Calabi-Yau manifold. Mir- 
ror symmetry is the remarkable phenomenon that the 
A version of one Calabi-Yau manifold is related to the 
B version of an entirely different “mirror partner.” The 
mathematical consequences of such an equivalence are 
extremely rich. (See mirror symmetry [IV.14] for more 
details. For other notions related to those discussed in 
this article, see symplectic manifolds [III.90].) 


The Calculus of Variations 

See Variational Methods [III.96] 


III. 7 Cardinals 


The cardinality of a set is a measure of how large that 
set is. More precisely, two sets are said to have the same 
cardinality if there is a bijection between them. So what 
do cardinalities look like? 

There are finite cardinalities, meaning the cardinali- 
ties of finite sets: a set has “cardinality n” if it has pre- 
cisely n elements. Then there are countable [III. 11] 
infimte sets: these all have the same cardinality (this 
follows from the definition of “countable”), usually 
written No- For example, the natural numbers, the inte- 
gers, and the rationals all have cardinality No- How- 
ever, the reals are uncountable, and so do not have 
cardinality No- In faet, their cardinality is denoted by 
2 N °. 

It turns out that cardinals can be added and multi- 
plied and even raised to powers of other cardinals (so 
“2 tt ° ” is not an isolated piece of notation). For details, 
and more explanation, see set theory [IV.l §2]. 


III.8 Categories 

Eugenia Cheng 


When we study groups [1.3 §2.1] or vector spaces 
[1.3 §2.3], we pay particular attention to certain classes 
of maps between them: the important maps between 
groups are the group homomorphisms [1.3 §4.1], and 
between vector spaces they are the linear maps 
[1.3 §4.2]. What makes these maps important is that 
they are the funetions that “preserve structure”: for 
example, if <£ is a homomorphism from a group G to a 
group H, then it “preserves multiplication,” in the sense 
that <l)(gig2) = <£(øi)</>(ø 2) for any pair of elements 
g i and g-2 of G. Similarly, linear maps preserve addition 
and scalar multiplication. 


The notion of a structure-preserving map applies far 
more generally than just to these two examples, and 
one of the purposes of category theory is to understand 
the general properties of such maps. For instance, if A, 
B, and C are mathematical structures of some given 
type, and / and g are structure-preserving maps from 
A to B and from B to C, respectively, then their compos- 
ite g o f is a structure-preserving map from A to C. That 
is, structure-preserving maps canbe composed (at least 
if the range of one equals the domain of the other). We 
also use structure-preserving maps to decide when to 
regard two structures as “essentially the same”: we call 
A and B isomorphic if there is a structure-preserving 
map from A to B with an inverse that also preserves 
structure. 

A category is a mathematical structure that allows 
one to discuss properties such as these in the abstract. 
It consists of a collection of objects, together with mor- 
phisms between those objects. That is, if a and b are 
two objects in the category, then there is a collection of 
morphisms between a and b. There is also a notion of 
composition of morphisms: if / is a morphism from a 
to b and g is a morphism from b to c, then there is a 
composite of / and g, which is a morphism from a to c. 
This composition must be associative. In addition, for 
each object a there is an “identity morphism,” which 
has the property that if you compose it with another 
morphism / then you get /. 

As the earlier discussion suggests, an example of 
a category is the category of groups. The objects of 
this category are groups, the morphisms are group 
homomorphisms, and composition and the identity are 
defined in the way we are used to. However, it is by no 
means the case that all categories are like this, as the 
following examples show. 

(i) We can form a category by taking the natural num- 
bers as its objects, and letting the morphisms from 
n to m be all the nxm matrices with real entries. 
Composition of morphisms is the usual matrix 
multiplication. We would not normally think of 
an nxm matrix as a map from the number n to 
the number m, but the axioms for a category are 
nevertheless satisfied. 

(ii) Any set can be turned into a category: the objects 
are the elements of the set, and a morphism from 
x to y is the assertion “x = y.” We can also make 
an ordered set into a category by letting a mor- 
phism from x to y be the assertion “x ^ y." (The 
“composite” of “x ^ y" and “y ^ z” is “x ^ z.”) 
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(111) Any group G can be made into a category as fol- 
lows: you have just one object, and the morphisms 
from that object to itself are the elements of the 
group, with the group multiplication defming the 
composition of two morphisms. 

(iv) There is an obvious category where the objects are 
topological spaces [III.92] and the morphisms 
are continuous functions. A less obvious category 
with the same objects takes as its morphisms 
not continuous functions but homotopy classes 
[IV.10 §2] of continuous functions. 

Morphisms are also called maps. However, as the 
above examples illustrate, the maps in a category do 
not have to be remotely map-like. They are also called 
arrows, partly to emphasize the more abstract nature 
of a general category, and partly because arrows are 
often used to represent morphisms pictorially. 

The general framework and language of “objects and 
morphisms” enable us to seek and study structural fea- 
tures that depend only on the “shape” of the category, 
that is, on its morphisms and the equations they sat- 
isfy. The idea is both to make general arguments that 
are then applicable to all categories possessing partic- 
ular structural features, and also to be able to make 
arguments in specific environments without having to 
go into the details of the structures in question. The use 
of the former to achieve the latter is sometimes referred 
to, endearingly or otherwise, as “abstract nonsense.” 

As we mentioned above, the morphisms in a cate- 
gory are generally depicted as arrows, so a morphism 
/ from a to b is depicted as a ~ b and composition 
is depicted by concatenating the arrows a — • b — c. 
This notation greatly eases complex calculations and 
gives rise to the so-called commutative diagrams that 
are often associated with category theory; an equality 
between composites of morphisms such as g ° f = t ° s 
is expressed by asserting that the following diagram 
commutes, that is, that either of the two different paths 
from a to c yield the same composite: 


Proving that one long string of compositions equals an- 
other then becomes a matter of “filling in” the space in 
between with smaller diagrams that are already known 
to commute. Furthermore, many important mathemati- 
cal concepts can be described in terms of commutative 


diagrams: some examples are free groups, free rings, 
free algebras, quotients, products, disjoint unions, 
function spaces, direct and inverse limits, completion, 
compactification, and geometric realization. 

Let us see how it is done in the case of disjoint unions. 
We say that a disjoint union of sets A and B is another 
set U equipped with morphisms A U and B — U 
such that, given any set X and morphisms A ~ X 
and B X, there is a unique morphism U — • X that 
makes the following diagram commute: 



A B 


Here p and q tell us how A and B inject into the dis- 
joint union. The “such that” part of the definition above 
is a universal property. It expresses the faet that giv- 
ing a function from the disjoint union to another set 
is precisely the same as giving a function from each 
of the individual sets; this completely characterizes 
a disjoint union (which we regard as defined up to 
isomorphism). Another viewpoint is that the univer- 
sal property expresses the faet that a disjoint union 
is the “most free” way of having two sets map into 
another set, neither adding any information nor col- 
lapsing any information. Universal properties are cen- 
tral to the way category theory describes structures 
that are somehow “canonical.” (See also the discus- 
sion of free groups in geometric and combinatorial 
GROUP THEORY [IV.ll].) 

Another key concept in a category is that of an iso- 
morphism. As one might expect, this is defined to be a 
morphism with a two-sided inverse. Isomorphic objects 
in a given category are thought of as “the same, as far as 
this particular category is concerned.” Thus, categories 
provide a framework in which the most natural way of 
classifying objects is “up to isomorphism.” 

Categories are mathematical structures of a certain 
kind, and as such they themselves form a category 
(subject to size restrictions so as to avoid a Russell- 
type paradox). The morphisms, which are the structure- 
preserving maps for categories, are called funetors. In 
other words, a funetor F from a category X to a cate- 
gory Y takes the objects of X to the objects of Y and 
the morphisms of X to the morphisms of Y in such 
a way that the identity of a is taken to the identity 
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of Fa and the composite of / and g is taken to the 
composite of Ff and Fg. An important example of a 
functor is the one that takes a topological space S with 
a “marked point” s to its fundamental group ru (S, s): 
it is one of the basic theorems of algebraic topology 
that a continuous map between two topological spaces 
(that takes marked point to marked point) gives rise to 
a homomorphism between their fundamental groups. 

Furthermore, there is a notion of morphism between 
functors, called a natural transformation, which is 
analogous to the notion of homotopy between maps of 
topological spaces. Given continuous maps F,G : X — 
Y, a homotopy from F to G gives us, for every point x in 
X, a path in Y from Fx to Gx; analogously, given func- 
tors F,G : X — • 7, a natural transformation from F to 
G gives us, for every point x in X, a morphism in Y from 
Fx to Gx. There is also a commuting condition that is 
analogous to the faet that, in the case of homotopy, a 
path in X must have its image under F continuously 
transformed to its image under G without passing over 
any “holes” in the space Y. This avoidance of holes is 
expressed in the category case by the commutativity of 
certain squares in the target category Y, which is known 
as the “naturality condition.” 

One example of a natural transformation encodes the 
faet that every vector space is canonically isomorphic 
to its double dual; there is a functor from the category 
of vector spaces to itself that takes each vector space 
to its double dual, and there is an invertible natural 
transformation from this functor to the identity func- 
tor via the canonical isomorphisms. By contrast, every 
finite-dimensional vector space is isomorphic to its 
dual, but not canonically so because the isomorphism 
involves an arbitrary choice of basis; if we attempt to 
construct a natural transformation in this case, we find 
that the naturality condition fails. In the presence of 
natural transformations, categories actually form a 2- 
category, which is a two-dimensional generalization of 
a category, with objects, morphisms, and morphisms 
between morphisms. These last are thought of as two- 
dimensional morphisms; more generally an n-category 
has morphisms for each dimension up to n. 

Categories and the language of categories are used 
in a wide variety of other branches of mathematics. 
Historically, the subject is closely associated with alge- 
braic topology; the notions were first introduced in 
1945 by Eilenberg and Mac Lane . Applications followed 
in algebraic geometry, theoretical computer science, 
theoretical physics, and logic. Category theory, with its 
abstract nature and lack of dependency on other helds 


of mathematics, canbe thought of as “foundational.” In 
faet, it has been proposed as an alternative candidate 
for the foundations of mathematics, with the notion of 
morphism as the basic one from which everything else 
is built up, instead of the relation of set membership 
that is used in set-theoretic foundations [IV. 1 §4]. 


Class Field Theory 

See FROM QUADRATIC RECIPROCITY TO 
CLASS FIELD THEORY [V.30] 


Cohomology 

See HOMOLOGY AND COHOMOLOGY [III.39] 


III.9 Compactness and 
Compactification 

Terence Tao 


In mathematics, it is well-known that the behavior of 
Hnite sets and the behavior of inhnite sets canbe rather 
different. For instance, each of the following statements 
is easily seen to be true whenever X is a finite set but 
false whenever X is an inhnite set. 

All fimetions are bounded. If / : X — ■ R is a real- 
valued funetion on X, then / must be bounded (i.e., 
there exists a Hnite number M such that |/(x)| ^ M 
for all x e X). 

All funetions attain a maximum. If / : X — ■ R is a real- 
valued funetion on X, then there must exist at least 
one point xo g X such that /(x o) ^ /(x) for all 

XGl 

All sequences have constant subsequences. If Xi, X2, 

X3 , ■ ■ ■ g X is a sequence of points in X, then there 
must exist a subsequence x ni ,x n2 ,x n3 ,... that is 
constant. In other words, x ni = x n2 = ■■■= c for 
some c g X. (This faet is sometimes known as the 
infinite pigeonhole principle.) 

The first statement— that all funetions on a Hnite set 
are bounded— can be viewed as a very simple exam- 
ple of a local-to-global principle. The hypothesis is an 
assertion of “local” boundedness: it asserts that |/(x) | 
is bounded for each point x g X separately, but with 
a bound that may depend on x. The conclusion is that 
of “global” boundedness: that |/(x)| is bounded by a 
single bound M for all x g X. 

So far we have viewed the object X only as a set. 
However, in many areas of mathematics we like to 
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endow our objects with additional structures, such as a 
TOPOLOGY [III.92], a METRIC [III.58], or a GROUP struc- 
ture [1.3 §2.1]. When we do this, it turns out that some 
objects exhibit properties similar to those of finite 
sets (in particular, they enjoy local-to-global principles), 
even though as sets they are infinite. In the categories 
of topological spaces and metric spaces, these “almost- 
finite” objects are known as compact spaces. (Other cat- 
egories have “almost-finite” objects as well. For exam- 
ple, in the category of groups there is a notion of a 
pro-finite group~, for linear operators [III. 52] between 
normed spaces [III.64] the analogous notion is that of 
a compact operator, which is “almost of finite rank”; 
and so forth.) 

A good example of a compact set is the closed unit 
interval X = [0, 1]. This is an infinite set, so the previ- 
ous three assertions are all false as stated for X. But if 
we modify them by inserting topological concepts such 
as continuity and convergence, then we can restore 
these assertions for [0, 1] as follows. 

All continuous functions are bounded. If / : X — ■ R is 

a real-valued continuous function on X, then / must 
be bounded. (This is again a type of local-to-global 
principle: if a function does not vary too much locally, 
then it does not vary too much globally.) 

All continuous functions attain a maximum. If / : 
X - lis a real-valued continuous function on X, 
then there must exist at least one point Xq e X such 
that f(x o) ^ f(x) for all % g X. 

All sequences have convergent subsequences. If 
Xi,X2,Xs, ■ ■ ■ £ Tis a sequence of points in X, then 
there must exist a subsequence x ni , x n2 , x n:) ,... that 
converges to some limit c e X. (This statement is 
known as the Bolzano-Weierstrass theorem.) 

To these assertions we can add a fourth (which, like the 
others, has a rather trivial analogue for finite sets). 

All open covers have finite subcovers. If "V is a col- 
lection of open sets and the union of all these open 
sets contains X (in which case "V is called an open 
cover of X), then there must exist a finite subcol- 
lection V ni , V„ 2 ,..., V nk of sets in T 7 that still covers 
X. 

All four of these topological statements are false for 
sets such as the open unit interval (0,1) or the real 
line R, as one can easily check by constructing simple 
counterexamples. The Heine-Borel theorem asserts that 
when X is a subset of a Euclidean space R n , the above 


statements are all true when X is topologically closed 
and bounded, and all false otherwise. 

The above four assertions are closely related to each 
other. For instance, if you know that all sequences 
in X contain convergent subsequences, then you can 
quickly deduce that all continuous functions have a 
maximum. This is done by first constructing a maximiz- 
ing sequence— a sequence of points x n in X such that 
f(x n ) approaches the maximal value of / (or, more pre- 
cisely, its supremum)— and theninvestigating a conver- 
gent subsequence of that sequence. In faet, given some 
fairly mild assumptions on the space X (e.g., that X 
is a metric space), one can deduce any of these four 
statements from any of the others. 

To oversimplify a little, we say that a topological 
space X is compact if one (and hence all) of the above 
four assertions holds for X. Because the four assertions 
are not quite equivalent in general, the formal defini- 
tion of compactness uses only the fourth version: that 
every open cover has a finite subeover. There are other 
notions of compactness, such as sequential compact- 
ness, for example, which is based on the third version, 
but the distinetions between these notions are technical 
and we shall gloss over them here. 

Compactness is a powerful property of spaces, and it 
is used in many ways in many different areas of math- 
ematics. One is via appeal to local-to-global principles: 
one establishes local control on a function, or on some 
other quantity, and then uses compactness to boost 
the local control to global control. Another is to locate 
maxima or minima of a function, which is particularly 
useful in the calculus of variations [III.96]. A third 
is to partially recover the notion of a limit when deal- 
ing with nonconvergent sequences, by accepting the 
need to pass to a subsequence of the original sequence. 
(However, different subsequences may converge to dif- 
ferent limits; compactness guarantees the existence of 
a limit point, but not its uniqueness.) Compactness of 
one object also tends to beget compactness of other 
objects; for instance, the image of a compact set under 
a continuous map is still compact, and the product 
of finitely many or even infinitely many compact sets 
continues to be compact. This last result is known as 
Tychonoffs theorem. 

Of course, many spaces of interest are not compact. 
An obvious example is the real line R, which is not com- 
pact, because it contains sequences such as 1,2,3,... 
that are “trying to escape” the real line and that do not 
leave behind any convergent subsequences. However, 
one can often recover compactness by adding a few 
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more points to the space: this process is known as com- 
pactification. For instance, one can compactify the real 
line by adding one point at each end: we call the added 
points +oo and - oo. The resulting object, known as the 
extended real line [-00,+ 00], can be given a topology 
in a natural way, which basically defines what it means 
to converge to +00 or to -00. The extended real line is 
compact: any sequence x„ of extended real numbers 
will have a subsequence that either converges to +00, 
converges to -00, or converges to a finite number. Thus, 
by using this compactification of the real line, we can 
generalize the notion of a limit to one that no longer 
has to be a real number. While there are some draw- 
backs to dealing with extended reals instead of ordi- 
nary reals (for instance, one can always add two real 
numbers together, but the sum of +00 and -00 is unde- 
fined), the ability to take limits of what would otherwise 
be divergent sequences can be very useful, particularly 
in the theory of infmite series and improper integrals. 

It turns out that a single noncompact space can have 
many different compactifications. For instance, by the 
device of stereographic projection, one can topologi- 
cally identify the real line with a circle that has a sin- 
gle point removed. (For example, if one maps the real 
number x to the point (x/(l + x 2 ),x 2 /(l + %) 2 ), then 
R maps to the circle of radius 5 and center (O, l ), with 
the north pole ( 0 , 1 ) removed.) If we then insert the 
missing point, we obtain the one-point compactification 
Ru {00} of the real line. More generally, any reason- 
able topological space (e.g., a locally compact Hausdorff 
space) has a number of compactifications, ranging from 
the one-point compactification X u {00}, which is the 
“minimal” compactification as it adds only one point, to 
the Stone-Cech compactification fX, which is the “max- 
imal” compactification, and adds an enormous number 
of points. The Stone-Cech compactification jf?N of the 
natural numbers N is the space of ultrafilters, which 
are very useful tools in the more infinitary parts of 
mathematics. 

One can use compactifications to distinguish be- 
tween different types of divergence in a space. For 
instance, the extended real line [-00, +00] distinguishes 
between divergence to +00 and divergence to -00. in a 
similar spirit, by using compactifications of the plane 
R 2 such as the projective plane [ 1.3 § 6 . 7 ], one can dis- 
tinguish a sequence that diverges along (or near) the x- 
axis from a sequence that diverges along (or near) the 
y-axis. Such compactifications arise naturally in situa- 
tions in which sequences that diverge in different ways 
exhibit markedly different behavior. 


Another use of compactifications is to allow one to 
rigorously view one type of mathematical object as a 
limit of others. For instance, one can view a straight 
line in the plane as the limit of increasingly large circles 
by describing a suitable compactification of the space 
of circles that includes lines. This perspective allows 
us to deduce certain theorems about lines from analo- 
gous theorems about circles, and conversely to deduce 
certain theorems about very large circles from theo- 
rems about lines. In a rather different area of mathe- 
matics, the Dirac delta function is not, strictly speaking, 
a function, but it exists in certain (local) compactifica- 
tions of spaces of functions, such as spaces of mea- 
sures [III. 5 7 ] or distributions [III. 18 ]. Thus, one can 
view the Dirac delta function as a limit of classical func- 
tions, and this can be very useful for manipulating it. 
One can also use compactifications to view the continu- 
ous as the limit of the discrete: for instance, it is possi- 
ble to compactify the sequence Z/ 2 Z, Z/ 3 Z, Z/ 4 Z, ... of 
cyclic groups in such a way that their limit is the circle 
group T = R/Z. These simple examples can be general- 
ized to muchmore sophisticated examples of compact- 
ifications, which have many applications in geometry, 
analysis, and algebra. 


III. 10 Computational Complexity 
Classes 


One of the basic challenges of theoretical computer sci- 
ence is to determine what computational resources are 
necessary in order to perform a given computational 
task. The most basic resource is time, or equivalently 
(given the hardware) the number of steps needed to 
implement the most efficient algorithm that will actu- 
ally carry out the task. Especially important is how this 
time scales up with the size of the input for the task: 
for instance, how much longer does it take to factorize 
an integer with 2 n digits than an integer with n dig- 
its? Another resource connected with the feasibility of 
a computation is memory, one can ask how much stor- 
age space a computer will need in order to implement 
an algorithm, and how this can be minimized. A com- 
plexity class is a set of computational problems that can 
be performed with certain restrictions on the resources 
allowed. For instance, the complexity class P consists 
of all problems that can be performed in “polynomial 
time”: that is, there is some positive integer k such that 
if the size of the problem is n (in the example above, the 
size was the number of digits of the integer to be fac- 
torized), then the computation can be carried out in at 
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most n k steps. A problem belongs to P if and only if the 
time taken to solve it scales up by at most a constant 
factor when the size of the input scales up by a con- 
stant factor. A good example of such a problem is mul- 
tiplication of two n-digit numbers: if you use ordinary 
long multiplication, then replacing n by 2n increases 
the time taken by a factor of 4. 

Suppose that you are presented with a positive inte- 
ger x and told that it is a product of two primes p and 
q. How difficult is it to determine p and q> Nobody 
knows, but one thing is easy to see: if you are told p 
and q, then it is not hard (for a computer, at any rate) to 
check that pq really does equal x. Indeed, as we have 
just seen, long multiplication takes polynomial time, 
and comparing the answer with x is even easier. The 
complexity class NP consists of those computational 
tasks for which a correct answer can be verified in poly- 
nomial time, even if it cannot necessarily be found in 
polynomial time. Remarkably, although this is a fun- 
damental distinction, nobody knows how to prove that 
P 4 NP: this problem is widely considered to be the 
most important in theoretical computer science. 

We briefly mention two other important complexity 
classes. PSPACE consists of all problems that can be 
solved using an amount of memory that grows at most 
polynomially with the size of the input. It turns out to 
be the natural class associated with reasonable compu- 
tational strategies for games such as chess. The com- 
plexity class NC is the set of all Boolean functions that 
can be computed by a “Circuit of polynomial size and 
depth at most a polynomial in log n.” This last class is 
a model for the class of problems that can be solved 
very rapidly using parallel processing. In general, com- 
plexity classes are often surprisingly good at character- 
izing large families of problems with interesting and 
intuitively recognizable features in common. Another 
remarkable faet is that almost all complexity classes 
have “hårdest problems” within them: that is, problems 
for which a solution canbe converted into a solution for 
any other problem in the class. These problems are said 
to be complete for the class in question. 

These issues, as well as several other complexity 
classes, are discussed in computational complexity 
[IV.21]. A vast number of further classes can be found 
at 

http://qwiki.stanford.edU/wiki/Complexity_Zoo#ac 
along with brief definitions of each. 


Continued Fractions 

See THE EUCLIDEAN ALGORITHM AND 
CONTINUED FRACTIONS [III.22] 


III. 11 Countable and Uncountable Sets 


Infinite sets arise all the time in mathematics: the natu- 
ral numbers, the squares, the primes, the integers, the 
rationals, the reals, and so on. It is often natural to try 
to compare the sizes of these sets: intuitively, one feels 
that the set of natural numbers is “smaller” than the 
set of integers (as it contains just the positive ones), 
and mueh larger than the set of squares (since a typi- 
cal large integer is unlikely to be a square). But can we 
make comparisons of size in a precise way? 

An obvious method of attack is to build on our intu- 
ition about finite sets. If A and B are finite sets, there 
are two ways we might go about comparing their sizes. 
One is to count their elements: we obtain two nonnega- 
tive integers m and n and just look at whether m <n, 
m = n, or m > n. But there is another important 
method, which does not require us to know the sizes of 
either A or B. This is to pair off elements from A with 
elements of B until one or other of the sets runs out 
of elements: the first one to run out is the smaller set, 
and if there is a dead heat, then the sets have the same 

A suitable modification of this second method works 
for infinite sets as well: we can declare two sets to 
be of equal size if there is a one-to-one correspon- 
dence between them. This turns out to be an important 
and useful definition, though it does have some con- 
sequences that seem a little odd at first. For example, 
there is an obvious one-to-one correspondence between 
natural numbers and perfeet squares: for each n we let 
n correspond to n 2 . Thus, according to this definition 
there are “as many” squares as there are natural num- 
bers. Similarly, we could show that there are as many 
primes as natural numbers by associating n with the 
nth prime number. 1 

What about z? It seems that it should be “twice as 
large” as N, but again we can find a one-to-one corre- 
spondence between them. We just list the integers in 
the order 0, 1, -1, 2, -2, 3, -3, . . . and then match the 
natural numbers with them in the obvious way: O with 


1. There is a notion of “density” according to which the sets of 
squares and primes have density O, the even numbers have density 
\ , and so on for all sufficiently nice sets. This notion can be useful 
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O, then 1 with 1, then 2 with -1, 3 with 2, 4 with -2, 

An infinite set is called countable if it has the same 
size as the natural numbers. As the above example 
shows, this is exactly the same as saying that we can 
list the elements of the set. Indeed, if we have Usted a 
set A as ao , a \ , a2 , . . . , then our correspondence is just 
to send n to a n - It is worth noting that there are of 
course many attempted listings that fail: for example, 

for Z we might have tried -3, -2, -1, 0, 1, 2, 3,4, So 

it is important to recognize that when we say that a 
set is countable we are not saying that every attempt 
to list it works, or even that the obvious attempt does: 
we are merely saying that there is some way of listing 
the elements. This is in complete contrast to finite sets, 
where if we attempt to match up two sets and find some 
elements of one set left over, then we know that the 
two sets cannot be in one-to-one correspondence. It is 
this difference that is mainly responsible for the “odd 
consequences” mentioned above. 

Now that we have established that some sets that 
seem smaller or larger than N, such as the squares 
or the integers, are actually countable, let us turn to 
a set that seems “much larger,” namely Q. How could 
we hope to list all the rationals? After aU, between any 
two of them you can find infinitely many others, so it 
seems hard not to leave some of them out when you 
try to list them. However, remarkable as it may seem, 
it is possible to list the rationals. The key idea is that 
listing the rationals whose numerator and denominator 
are both smaller (in modulus) than some fixed number 
k is easy, as there are only finitely many of them. So 
we go through in order: first when both numerator and 
denominator are at most 1, then when they are at most 

2, and so on (being careful not to relist any number, so 
that for example \ should not also appear as \ or g). 
This leads to an ordering such as 0, 1, -1, 2, -2, 

3, -3,i,-f,4,-4,i,-i,§,-f,f,-f,5,— 5,.... 

We could use the same idea to Ust even larger-looking 
sets such as, for example, the algebraic numbers (all 
real numbers, such as V2, that satisfy a polynomial 
equation with integer coefficients). Indeed, we note that 
each polynomial has only finitely many roots (which are 
therefore listable), so all we need to do is Ust the polyno- 
mials (as then we can go through them, in order, Usting 
their roots). And we can do that by applying the same 
technique again: for each d we Ust those polynomials of 
degree at most d that we have not already listed, with 
coefficients that are at most d in modulus. 


Based on the above examples, one might well guess 
that every infinite set is countable. But a beautiful argu- 
ment of cantor [VI.54], caUed his “diagonal” argu- 
ment, shows that the real numbers are not countable. 
We imagine that we have a Ust of all real numbers, say 

ri , r 2 , rj , , Our aim is to show that this Ust cannot 

possibly contain aU the reals, so we wish to construct a 
real that is not on this Ust. How do we accompUsh this? 
We have each r,- written as an infinite decimal, say, and 
now we define a new number s as follows. For the first 
digit of s (after the decimal point), we choose a digit that 
is not the first digit of n . Note that this already guaran- 
tees that 5 cannot equal n. (To avoid coincidences with 
recurring 9s and the like, it is hest to choose this first 
digit of 5 not to be 0 or 9 either.) Then, for the second 
digit of 5 , we choose a digit that is not the second digit 
of r 2 ; this guarantees that 5 cannot be equal to r 2 . Con- 
tinuing in this way, we end up with a real number s that 
is not on our Ust: whatever n is, the number 5 cannot 
be r n , as 5 and R n differ in the nth decimal place! 

One can use similar arguments any time that we have 
“an infinite number of independent choices” to make 
in specifying an object (like the various digits of 5). For 
example, let us use the same ideas to show that the 
set of all subsets of N is uncountable. Suppose we have 

listed all the subsets as Ai,A 2 ,A 3 We wUl define 

a new set B that is not equal to any of the A n . So we 
include the point 1 in B if and only if 1 does not belong 
to Ai (this guarantees that B is not equal to Ai), and 
we include 2 in B if and only if 2 does not belong to A 2 , 
and so on. It is amusing to note that one can write this 
set B down as {n e N : n i A n }, which shows a striking 
resemblance to the set in RusseU’s paradox. 

Countable sets are the “smallest” infinite sets. How- 
ever, the set of real numbers is by no means the 
“largest” infimte set. Indeed, the above argument shows 
that no set X can be put into one-to-one correspon- 
dence with the set of aU its subsets. So the set of all 
subsets of the real numbers is “strictly larger” than the 
set of real numbers, and so on. 

The notion of countabUity is often a very fruitful 
one to bear in mind. For example, suppose we want to 
know whether or not aU real numbers are algebraic. It 
is a genuinely hard exercise to write down a particu- 
lar real that is transcendental [IU.43] (meaning not 
algebraic; see liouville’s theorem and roth’s theo- 
rem [V.25] for an idea of how it can be done), but the 
above notions make it utterly trivial that transcenden- 
tal numbers exist. Indeed, the set of aU real numbers is 
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uncountable but the set of algebraic numbers is count- 
able! Furthermore, this shows that “most” real numbers 
are transcendental: the algebraic numbers form only a 
tiny proportion of the reals. 


III. 12 C* -Algebras 


A BANACH SPACE [III.64] is both a VECTOR SPACE 
[1.3 §2.3] and a metric space [III.58], and the study of 
Banach spaces is therefore a mixture of linear algebra 
and analysis. However, one can arrive at more sophis- 
ticated mixtures of algebra and analysis if one looks at 
Banach spaces with more algebraic structure. In partic- 
ular, while one can add two elements of a Banach space 
together, one cannot in general multiply them. How- 
ever, sometimes one can: a vector space with a multi- 
plicative structure is called an algebra, and if the vector 
space is also a Banach space, and if the multiplication 
has the property that \\xy || ^ ||x|| ||y|| for any two ele- 
ments x and y , then it is called a Banach algebra. (This 
name does not really reflect historical reality, since the 
basic theory of Banach algebras was not worked out 
by Banach. A more appropriate name might have been 
Gelfand algebras.) 

A C* -algebra is a Banach algebra with an involution, 
which means a function that associates with each ele- 
ment x another element x* in such a way that x* * = x, 
||x* || = ||x||, ( x + y )* = x* +y*, and (xy)* =y*x* 
for any two elements x and y. A basic example of a 
C* -algebra is the algebra B(H) of all continuous lin- 
ear maps T dehned on a hilbert space [III. 3 7] H. The 
norm of T is dehned to be the smallest constant M such 
that II Tx|| ^ M||x|| for every x g H, and the involution 
takes T to its adjoint. This is a map T* that has the 
property that (x, Ty) = (T*x,y) for every x and y in 
H. (It can be shown that there is exactly one map with 
this property.) If H is hnite dimensional, then T can be 
thought of as an n x n matrix for some n, and T* is 
then the complex conjugate of the transpose of T. 

A fundamental theorem of Gelfand and Naimark 
States that every C* -algebra can be represented as a 
subalgebra of B(H) for some Hilbert space H. For more 
information, see operator algebras [IV.19 §3]. 


III. 13 Curvature 


If you cut an orange in half, scoop out the inside, and 
try to hatten one of the resulting hemispheres of peel, 
then you will tear it. If you try to hatten a horse’s saddle, 
or a soggy potato chip, then you will have the opposite 


problem: this time, there is “too much” of the surface 
to hatten and you will have to fold it over itself. If, how- 
ever, you have a roil of wallpaper and wish to hatten it, 
then there is no difhculty: you just unroll it. Surfaces 
such as spheres are said to be positively curved, ones 
with a saddle-like shape are negatively curved, and ones 
like a piece of wallpaper are flat. 

Notice that a surface can be hat in this sense even if 
it does not lie in a plane. This is because curvature is 
dehned in terms of the intrinsic geometry of a surface, 
where distance is measured in terms of paths that lie 
inside the surface. 

There are various ways of making the above notion 
of curvature precise, and also quantitative, so that with 
each point of a surface one can associate a number that 
tells you “how curved” it is at that point. In order to 
do this, the surface must have a riemannian metric 
[1.3 §6.10] on it, which is used to determine the lengths 
of paths. The notion of curvature can also be general- 
ized to higher dimensions, so that one can talk about 
the curvature of a point in a d-dimensional Rieman- 
nian manifold. However, when the dimension is higher 
than 2, the way that the manifold can curve at a point 
is more complicated, and is expressed not by a single 
number but by the so-called Ricci tensor. See ricci flow 
[III.80] for more details. 

Curvature is one of the fundamental concepts of 
modern geometry: not only the notion just described 
but also various alternative definitions that measure in 
other ways how far a geometric object deviates from 
being flat. It is also an integral part of the theory 
of general relativity (which is discussed in general 
RELATIVITY AND THE EINSTEIN EQUATIONS [IV. 17]). 


III. 14 Designs 

Peter J. Cameron 


Block designs were first used in the design of experi- 
ments in statistics, as a method for coping with system- 
atic differences in the experimental material. Suppose, 
for example, that we want to test seven different vari- 
eties of seed in an agricultural experiment, and that we 
have twenty-one plots of land available for the experi- 
ment. If the plots can be regarded as identical, then the 
best strategy is clearly to plant three plots with each 
variety. Suppose, however, that the available plots are 
on seven farms in different regions, with three plots 
on each farm. If we simply plant one variety on each 
farm, we lose information, because we cannot distin- 
guish systematic differences between regions from dif- 
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ferences in the seed varieties. It is better to follow a 
scheme like this: plant varieties 1, 2, 3 on the first farm; 
1, 4, 5 on the second; and then 1, 6, 7; 2, 4, 6; 2, 5, 7; 
3, 4, 7; and 3, 5, 6. This designis represented in figure 1. 

This arrangement is cailed a balanced incomplete- 
block design, or BIBD for short. The blocks are the 
sets of seed varieties used on the seven farms. The 
blocks are “incomplete” because not every variety can 
be planted on every farm; the design is “balanced” 
because each pair of varieties occur together in a block 
the same number of times (just once in this case). 
This is a (7, 3, 1) design: there are seven varieties; each 
block contains three of them; and two varieties occur 
together in a block once. It is also an example of a 
finite projective plane. Because of the connection with 
geometry, varieties are usually cailed “points.” 

Mathematicians have developed an extensive theory 
of BIBDs and related classes of designs. Indeed, the 
study of such designs predates their use in statistics. 
In 1847, T. P. Kirkman showed that a (v, 3, 1) design 
exists if and only if v is congruent to 1 or 3 mod 6. (Such 
designs are now cailed Steiner triple systems, although 
Steiner did not pose the problem of their existence until 
1853.) 

Kirkman also posed a more difficult problem. In his 
own words, 

Fifteen young ladies in a school walk out three abreast 
for seven days in succession: it is required to arrange 
them daily so that no two shall walk twice abreast. 

The solution requires a (15,3,1) Steiner triple system 
with the extra property that the thirty-five blocks can 
be partitioned into seven sets cailed “replicates,” each 


replicate consisting of five blocks that partition the set 
of points. Kirkman himself gave a solution, but it was 
not until the late 1960s that Ray-Chaudhuri and Wilson 
showed that (v,3,l) designs with this property exist 
whenever v is congruent to 3 mod 6. 

For which v, k, A do designs exist? Counting argu- 
ments show that, given k and A, the values of v 
for which (u,k,A) designs exist are restricted to cer- 
tain congruence classes. (We noted above that (v, 3, 1) 
designs exist only if v is congruent to 1 or 3 mod 6.) 
An asymptotic existence theory developed by Richard 
Wilson shows that this necessary condition is sufficient 
for the existence of a design, apart from finitely many 
exceptions, for each value of k and A. 

The concept of design has been further generalized: 
a t-(v, k, A) design has the property that any t points 
are contained in exactly A blocks. Luc Teirlinck showed 
that nontrivial t-designs exist for all t, but examples 
for t > 3 are comparatively rare. 

The statisticians’ concerns are a bit different. In our 
introductory example, if only six farms were available, 
we could not use a BIBD for the experiment, but would 
have to choose the most “efficient” possible design 
(allowing the most information to be obtained from 
the experimental results). A BIBD is most efficient if it 
exists; but not much is known in other cases. 

There are other types of design; these can be impor- 
tant to statistics and also lead to new mathematics. 
Here, for example, is an orthogonal array, in any two 
rows of this matrix, each ordered pair of symbols from 
{0, 1, 2} occurs just once: 

000111222 
0 1 2 0 1 2 0 1 2 
012120201 
021102210 
It could be used if we had four different treatments, 
each of which could be applied at three different levels, 
and if we had nine plots available for testing. 

Design theory is closely related to other combina- 
torial topics such as error-correcting codes; indeed, 
Fisher “discovered” the Hamming codes as designs five 
years before R. W. Hamming found them in the context 
of error correction. Other related subjects include pack- 
ing and covering problems, and especially finite geom- 
etry, where many finite versions of classical geometries 
can be regarded as designs. 
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The determinant of a 2 x 2 matrb 



is defined to be ad - bc. The determinant of a 3 x 3 
matrix 



is defined tobe aei+bfg+cdh-afh-bdi-ceg. What 
do these expressions have in common, how do they 
generalize, and why is the generalization significant? 

To begin with the first question, let us make a few 
simple observations. Both expressions are sums and 
differences of products of entries from the matrix. 
Each one of these products contains exactly one ele- 
ment from each row of the matrix and also exactly 
one element from each column. In both cases, a minus 
sign seems to attach itself to the products for which 
the entries selected from the matrix “slope backward” 
rather than forward. 

Up to a point it is easy to see how to extend this 
definition tonxn matrices with n > 4. We simply 
take sums and differences of all possible products of 
n entries, where one entry from each row is used and 
one from each column. The difficulty comes in deciding 
which of these products to add and which to subtract. 
To do this we take one of the products and use it to 
define a permutation a of the set {1, 2, . . . , n} as fol- 
lows. For each i ^ n, the product contains exactly one 
entry in the ith row. If it belongs to the jth column 
then cr(i) = j. The product is added if this permuta- 
tion is even and subtracted if it is odd (see permuta- 
tion groups [III.70]). So, for example, the permutation 
corresponding to the entry afh in the 3x3 determi- 
nant above sends 1 to 1, 2 to 3, and 3 to 2. This is an 
odd permutation, which is why afh receives a minus 
sign. 

We still need to explain why the particular choice of 
products and minus signs that we have just defined 
is important. The reason is that it tells us something 
about the effect of a matrix when it is considered as a 
linear map. Let A be an n x n matrix. Then, as explained 
in [1.3 §3.2], A specifies a linear map « from R n to R™. 
The determinant of A tells us what this linear map 
does to volumes. More precisely, if X is a subset of R™ 
with n-dimensional volume V, then aX, the result of 
transforming X using the linear map «, will have vol- 
ume V times the determinant of A. We could write this 


symbolically as follows: 

vol(aX) = detA ■ vol(X). 
For example, consider the 2x2 matrix 



The corresponding linear map is a rotation of R 2 
through an angle of 9. Since rotating a shape does not 
affect its volume, we should expect the determinant of 
A to be 1, and sure enough it is cos 2 0 + sin 2 0, which 
is 1 by Pythagoras’s theorem. 

The above explanation is a slight oversimplifica- 
tion in one respect: determinants can be negative, but 
clearly volumes cannot. If the determinant of a matrix 
is -2, to give an example, it means that the linear map 
multiplies volumes by 2 but also “turns shapes inside 
out” by reflecting them. 

Determinants have many useful properties, which 
become obvious once one knows the above interpre- 
tation in terms of volumes. (However, it is much less 
obvious that this interpretation is correct: in setting 
up the theory of determinants one must do some work 
somewhere.) Let us give three of these properties. 

(i) Let V be a vector space [1.3 §2.3] and let « : V - V 
be a linear map. Let v \ , . . . , v n be a basis of V and let 
A be the matrix of a with respect to this basis. Now let 
W\ , . . . , w„ be another basis of V and let B be the matrix 
of a with respect to this different basis. Then A and B 
are different matrices, but since they both represent 
the linear map «, they must have the same effect on 
volumes. It follows that det(A) = det(B). To put this 
another way: the determinant is better thought of as a 
property of linear maps rather than of matrices. 

Two matrices that represent the same linear map in 
the above sense are called similar. It turns out that 
A and B are similar if and only if there is an invert- 
ible matrix P such that P~ ] AP = B. (An ttxn matrix 
P is invertible if there is a matrix Q such that PO 
equals the nxn identity matrix, I n . It turns out that 
QP must also equal I n as well. If this is true, then Q 
is called the inverse of P and is denoted P _1 .) What we 
have just shown is that similar matrices have the same 
determinant. 

(il) If A and B are any two nxn matrices, then they 
represent linear maps a and (8 of R n . The product 
AB represents the linear map af}: that is, the linear 
map that results from doing /} followed by a. Since 0 
multiplies volumes by det B and a multiplies them by 
det A, afi multiplies them by det A det B. It follows that 
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det(AB) = det A det B. (The determinant of a product 
equals the product of the determinants.) 

(iii) If A is a linear map with determinant 0 and B is any 
other linear map, then AB will, by the multiplicative 
property just discussed, have determinant 0 as well. It 
follows that AB cannot equal I n , since I n has determi- 
nant 1. Therefore a matrix with determinant 0 is not 
invertible. The converse of this turns out to be true as 
well: a matrix with nonzero determinant is invertible. 
Thus, the determinant gives us a way of finding out 
whether a matrix can be inverted. 


III. 16 Differential Forms and Integration 

Terence Tao 


It goes without saying that integration is one of the 
fundamental concepts of single-variable calculus. How- 
ever, there are in faet three concepts of integration that 
appear in the subject: the indefinite integral J / (also 
known as the antiderivative), the unsigned definite inte- 
gral S[ at b] fix) dx (which one would use to find the 
area under a curve, or the mass of a one-dimensional 
object of varying density), and the signed definite inte- 
gral få fix) dx (which one would use, for instance, to 
compute the work required to move a particle from a to 
b). For simplicity we shall restrict our attention here to 
funetions / : R — ■ R that are continuous on the entire 
real line (and similarly, when we come to differential 
forms, we shall discuss only forms that are continu- 
ous on the entire domain). We shall also informally use 
terminology such as “infinitesimal” in order to avoid 
having to discuss the (routine) “epsilon-delta” analyti- 
cal issues that one must resolve in order to make these 
integration concepts fully rigorous. 

These three integration concepts are of course 
closely related to each other in single-variable calcu- 
lus; indeed, the fundamental theorem of calculus 
[1.3 §5.5] relates the signed definite integral \a f(x) dx 
to any one of the indefinite integrals F = J / by the 
formula 

rb 

I f(x)åx = F(b)-F(a), (1) 

while the signed and unsigned integral are related by 
the simple identity 

J* fix) dx J" f(x) dx fix) dx, (2) 

which is valid whenever a < fi. 

When one moves from single-variable calculus to 
several-variable calculus, though, these three concepts 


begin to diverge significantly from each other. The 
indefinite integral generalizes to the notion of a solu- 
tion to a differential equation, or to an integral of a con- 
neciiou, viLjxiK hud |I\ K) § 5], ot m/Nnn [f\ l()§5|r 
The unsigned definite integral generalizes to the 
lebesgue integral [III. 5 7], or more generally to inte- 
gration on a measure space. Finally, the signed def- 
inite integral generalizes to the integration of forms, 
which will be our focus here. While these three con- 
cepts are still related to each other, they are not as 
interchangeable as they are in the single-variable set- 
ting. The integration-of-forms concept is of fundamen- 
tal importance in differential topology, geometry, and 
physics, and also yields one of the most important 
examples of cohomology [IV.10 §4], namely de Rham 
cohomology, which (roughly speaking) measures the 
extent to which the fundamental theorem of calculus 
fails in higher dimensions and on general manifolds. 

To provide some motivation for the concept, let us 
informally revisit one of the basic applications of the 
signed definite integral from physics, namely com- 
puting the amount of work required to move a one- 
dimensional particle from point a to point b in the 
presence of an external field. (For example, one might 
be moving a charged particle in an electric field.) At 
the infinitesimal level, the amount of work required to 
move a particle from a point Xj e 1 to a nearby point 
xt+i É R is (up to a small error) proportional to the dis- 
placement Ax,- = X;+i - x,;, with the constant of pro- 
portionality f(x{) depending on the initial location X; 
of the particle. Thus, the total work required for this is 
approximately /(Xi)AXj. Note that we do not require 
xj+i to be to the right of x ( - , so the displacement Ax* (or 
the infinitesimal work /(Xi)Ax,) may well be negative. 
To return to the noninfinite simal problem of comput- 
ing the work required to move from a to b, we arbi- 
trarily select a discrete path xo = a,xi,X2, . . . ,x n = b 
from a to b, and approximate the work as 
r b i-i 

/(x)dx» X f(Xi)AXi. (3) 

Ja i = o 

Again, we do not require Xf+i to be to the right of xp, 
it is quite possible for the path to “backtrack” repeat- 
edly: for instance, one might have X; < Xj+i > x*+2 for 
some i. However, it turns out that the effeet of such 
backtracking eventually cancels itself out; regardless 
of what path we choose, the expression (3) above con- 
verges as the maximum step size tends to zero, and the 
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limit is the signed definite integral 

r b 

J fix) dx, (4) 

provided only that the total length X?=o |AXf| of 
the path (which Controls the amount of backtracking 
involved) stays bounded. In particular, in the case when 
a = b, so that all paths are closed (i.e., xo = x n ), we see 
that the signed definite integral is zero: 

j a f(x)dx = 0. (5) 

From this informal definition of the signed definite 
integral it is obvious that we have the concatenation 
formula 

| f(x) dx = | /(x) dx + J fix) dx (6) 

regardless of the relative position of the real numbers 
a, b, and c. In particular (setting a = c and using (5)) 
we conclude that 

J /(x)dx = -J" /(x)dx. 

Thus if we reverse a path from a to b to form a path 
from b to a, the sign of the integral changes. This con- 
trasts with the unsigned definite integral f^ a M f(x) dx, 
since the set [ a , b\ of numbers between a and b is 
exactly the same as the set of numbers between b and 
a. Thus we see that paths are not quite the same as 
sets: they carry an orientation which can be reversed, 
whereas sets do not. 

Now let us move from one-dimensional integration 
to higher-dimensional integration: that is, from single- 
variable calculus to several-variable calculus. It turns 
out that there are two objects whose dimensions may 
increase: the “ambient space,” 1 which will now be R™ 
instead of R, and the path, which will now become an 
oriented fc-dimensional manifold S, over which the inte- 
gration will take place. For example, if n = 3 and k = 2, 
then one is integrating over a surface that lives in R 3 . 

Let us begin with the case n > 1 and k = 1. Here, we 
will be integrating over a continuously differentiable 
path (or oriented rectifiable curve ) y in R n starting and 
ending at points a and b, respectively. (These points 
may or may not be distinet, depending on whether the 
path is closed or open.) From a physical point of view, 
we are still computing the work required to move from 
a to b, but now we are moving in se veral dimensions 


1. We will start with integration on Euclidean spaces K" for sim- 

plicity, although the true power of the integration-of-forms concept is 
mueh more apparent when we integrate on more general spaces, such 

as abstract n-dimensional manifolds. 


instead of one. In the one-dimensional case, we did not 
need to specify exactly which path we used to get from 
a to b, because all backtracking canceled itself out. 
However, in higher dimensions, the exact choice of the 
path y becomes important. 

Formally, a path from a to b can be described (or 
parametrized) as a continuously differentiable funetion 
y from the unit interval [0, 1] to R n such that y(0) = 
a and y(l) = b. For instance, the line segment from 
a to b can be parametrized as y(t) = (1 - t)a + tb. 
This segment also has many other parametrizations, 
such as y(t) = (1 - t 2 )a + t 2 b; however, as in the one- 
dimensional case, the exact choice of parametrization 
does not ultimately influence the integral. On the other 
hånd, the reverse line segment (— y) (t) = ta+ (1 - t)b 
from b to a is a genuinely different path; the integral 
along -y will turn out to be the negative of the integral 
along y. 

As in the one-dimensional case, we will need to 
approximate the continuous path y by a discrete path 
xo = yito), xi = y(ti), X 2 = y(t 2 ), ■■■, x n = y(t n ), 
where y(to) = a and y(ti) = b. Again, we allow some 
backtracking: t,+i is not necessarily larger than f,. The 
displacement Ax; = Xt+i - x* e R n from x; to Xi+i is 
now a vector rather than a scalar. (Indeed, with an eye 
on the generalization to manifolds, one should think 
of Axf as an infinitesimal tangent vector to the ambi- 
ent space R n at the point x;.) In the one-dimensional 
case, we converted the scalar displacement A Xf into 
a new number /(Xi)Axi, which was linearly related 
to the original displacement by a proportionality con- 
stant /(Xj) that depended on the position x;. In higher 
dimensions, we again have a linear dependence, but 
this time, since the displacement is a vector, we must 
replace the simple constant of proportionality by a lin- 
ear transformation w Xi from R™ to R. Thus, w Xi (A x*) 
represents the infinitesimal “work” required to move 
from Xi to Xj+i . In technical terms, c o Xi is a linear func- 
tional on the space of tangent vectors at x<, and is thus 
a cotangent vector at x,. By analogy with (3), the net 
work j y co required to move from a to b along the path 
y is approximated by 

[co*X to **(A Xi). (7) 

3y i= 0 

As in the one-dimensional case, one can show that 
the right-hand side of (7) converges if the maximum 
step size sup 0s ; i ^„_ 1 Ax; of the path converges to 
zero and the total length Ya=o IAx,| of the path stays 
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bounded. The limit is written as J y co. (Recall that we 
are restricting our attention to continuous functions. 
The existence of this limit uses the continuity of co.) 

The object co, which continuously assigns 2 3 a cotan- 
gent vector to each point in R", is called a 1 -form, 
and (7) leads to a recipe for integrating any 1-form co 
on a path y. That is, to shift the emphasis slightly, it 
allows us to integrate the path y “against” the 1-form 
co. Indeed, it is useful to think of this integration as 
a binary operation (similar in some ways to the dot 
product) which takes the curve y and the form co as 
inputs, and returns a scalar J y co as output. There is in 
faet a “duality” between curves and forms; compare, for 
instance, the identity 

| (COl+U)2) = J COl +| C02, 

which expresses (part of) the fundamental faet that 
integration of forms is a linear operation, with the 
identity 



which generalizes (6) whenever the initial point of y2 is 
the final point of yi , where yi + y2 is the concatenation 
of yi and y 2 ? 

Recall that if / is a differentiable funetion from R™ 
to R, then its derivative at a point x is a linear map 
from R” to R (see [1.3 §5.3]). If / is continuously differ- 
entiable, then this linear map depends continuously on 
x, and can therefore be thought of as a 1-form, which 
we denote by d /, writing d f x for the derivative at x. 
This 1-form can be characterized as the unique 1-form 
such that one has the approximation 

f{x + v)*f(v) + df x (v) 

for all infinitesimal v. (More rigorously, the condition 
is that |/(x + v)-f(v) - df x (v)\/\v\ - 0 as v - 0.) 

The fundamental theorem of calculus (1) now gener- 
alizes to 

J d f = f(b)-f(a) (8) 

whenever y is any oriented curve from a point a to a 
point b. In particular, if y is closed, then J y df = 0. Note 
that in order to interpret the left-hand side of the above 
equation, we are regarding it as a particular example of 


2. More precisely, one can think of co as a section of the cotangent 
bundle. 

3. This duality Is best understood using the abstract, and mueh 
more general, formalism of homology and cohomology. In particular, 

one can remove the requirement that y2 begins where yi leaves off 
by generalizing the notion of an Integral to cover not just integration 
on paths, but also integration on formal sums or differences of paths. 
This makes the duality between curves and forms more symmetric. 


an integral of the form J y co: in this case, co happens to 
be the form df. Note also that, with this interpretation, 
df has an independent meaning (it is a 1-form) even if 
it does not appear under an integral sign. 

A 1-form whose integral against every sufficiently 
small 4 closed curve vanishes is called closed, while a 
1-form that can be written as df for some continuously 
differentiable funetion is called exact. Thus, the funda- 
mental theorem implies that every exact form is closed. 
This turns out to be a general faet, valid for all mani- 
folds. Is the converse true: that is, is every closed form 
exact? If the domain is a Euclidean space, or indeed 
any other simply connected manifold, then the answer 
is yes (this is a special case of the Poincaré lemma), but 
it is not true for general domains. In modern terminol- 
ogy, this demonstrates that the de Rham cohomology 
of such domains can be nontrivial. 

As we have just seen, a 1-form can be thought of as 
an object co that associates with each path y a scalar, 
which we denote by J y co. Of course, co is not just 
any old funetion from paths to scalars: it must sat- 
isfy the concatenation and reversing rules discussed 
earlier, and this, together with our continuity assump- 
tions, more or less forces it to be associated with some 
kind of continuously varying linear funetion that can be 
used, in combination with y, to define an integral. Now 
let us see if we can generalize this basic idea from paths 
to integration on fc-dimensional sets with k > 1. For 
simplicity we shall stick to the two-dimensional case, 
that is, to integration of forms on (oriented) surfaces 
in R™, since this already illustrates many features of 
the general case. 

Physically, such integrals arise when one is com- 
puting a flux of some held (e.g., a magnetic held) 
across a surface. We parametrized one-dimensional ori- 
ented curves as continuously differentiable functions 
y from the interval [0, 1] to R™. It is thus natural to 
parametrize two-dimensional oriented surfaces as con- 
tinuously differentiable functions f defined on the unit 
square [0, l] 2 . This does not in faet cover all possible 
surfaces one wishes to integrate over, but it turns out 
that one can cut up more general surfaces into pieces 
that can be parametrized using “nice” domains such as 
[0,1] 2 . 

In the one-dimensional case, we cut up the oriented 
interval [0, 1] into infimtesimal oriented intervals from 
ti to ti + 1 = ti + At, which led to infinitesimal curves 


4. The precise condition needed Is that the curve should be con- 
tractible, which means that It can be continuously shrunk down to a 
point. 
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from Xi = y(ti ) toxi+i = y(tf+i) = x, + Ax*. Note that 
Axj and At are related by the approximation A x,- *4 
y'(ti)Ati. In the two-dimensional case, we will cut up 
the unit square [0, l] 2 into infinitesimal squares in an 
obvious way. 5 A typical one of these will have cor- 
ners of the form (ti + At, t2), (ti, tg + At), 

(ti + At, t2 + At). The surface described by 4 > can 
then be partitioned into regions with corners </>(ti, t2 ), 
</>(ti + At, t2 ) , </>(ti, t2 + At), </>(ti +At, t2 + At), each of 
which carries an orientation. Since </> is differentiable, 
it is approximately linear at small distance scales, so 
this region is approximately an oriented parallelogram 
in R n with corners x, x + Aix, x + A2X, x + Aix + 
A2X, where x = (ti , t2 ) and Aix and A2X are the 
infinitesimal vectors 

Aix = |^(ti,t 2 )At, A 2 x = |^(ti,t 2 )At. 

Let us refer to this object as the infinitesimal parallel- 
ogram with dimensions Aix a A2X and base point x. 
For now, we will think of the symbol “a” as a mere 
notational convenience and not try to interpret it. In 
order to integrate in a manner analogous with inte- 
gration on curves, we now need some sort of func- 
tional oo x at this base point that depends continuously 
on x. This functional should take the above infinitesi- 
mal parallelogram and return an infinitesimal number 
co x (Aix a A2X), which one can think of as the amount 
of “flux” passing through this parallelogram. 

As in the one-dimensional case, we expect w x to have 
certain properties. For instance, if you double Aix, you 
double one of the sides of the infinitesimal parallel- 
ogram, so (by the continuity of co) the “flux” passing 
through the parallelogram should double. More gener- 
ally, co* (Aix a A2X) should depend linearly on each of 
Aix and A2X: in other words, it is bilinear. (This gen- 
eralizes the linear dependence in the one-dimensional 
case.) 

Another important property is that 

CO x (A 2 X A Aix) = — CO x (AiX A A2X). (9) 

That is, the bilinear form co* is antisymmetric. Again, 
this has an intuitive explanation: the parallelogram rep- 
resented by A2X a Aix is the same as that represented 
by Aix a A2X except that it has had its orientation 
reversed, so the “flux” now counts negatively where it 
used to count positively, and vice versa. Another way 


5. One could also use Infinitesimal oriented rectangles, parallel- 
ograms, triangles, etc.; this leads to an equivalent concept of the 
integral. 


of seeing this is to note that if Aix = A2X, then the par- 
allelogram is degenerate and there should be no flux. 
Antisymmetry follows from this and the bilinearity. A 
2 -form co is a continuous assignment of a functional 
co x with these properties to each point x. 

If co is a 2-formand </> : [0, 1 ] 2 — M n is a continuously 
differentiable function, we can now define the integral 
Jø co of co “against” c/r (or, more precisely, the inte- 
gral against the image under </> of the oriented square 
[0, l] 2 ) by the approximation 

æ ~ S CWxt(AXi,i A Ax 2 ,i), (10) 

where the image of </> is (approximately) partitioned 
into parallelograms of dimensions Axi.j a Ax2,i based 
at points x,. We do not need to decide what order 
these parallelograms should be arranged in, because 
addition is both commutative and associative. One can 
show that the right-hand side of (10) converges to a 
unique limit as one makes the partition of parallelo- 
grams “increasingly fine,” though we will not make this 
precise here. 

We have thus shown how to integrate 2-forms against 
oriented two-dimensional surfaces. More generally, one 
can define the concept of a fc-form on an n-dimensional 
manifold (such as R") for any 0 ^ fc ^ n and inte- 
grate this against an oriented fe-dimensional surface 
in that manifold. For instance, a 0-form on a manifold 
X is the same thing as a scalar function f : X — R, 
whose integral on a positively oriented point x (which 
is zero dimensional) is /(x), and on a negatively ori- 
ented point x is -/(x). A k-form tells us how to assign 
a value to an infinitesimal k-dimensional parallelepiped 
with dimensions Axi a ■ ■ ■ a Axu , and hence to a portion 
of k-dimensional “surface,” in much the same way as 
we have seen when k = 2. By convention, if k * k', the 
integral of a k-dimensional form on a k' -dimensional 
surface is understood to be zero. We refer to 0-forms, 
1-forms, 2-forms, etc. (and formal sums and differences 
thereof), collectively as differential forms. 

There are three fundamental operations that one can 
perform on scalar functions: addition (f,g) — / + g, 
pointwise product (f,g) -* fg, and differentiation 
/ — d /, although the last of these is not especially use- 
ful unless / is continuously differentiable. These oper- 
ations have various relationships with each other. For 
instance, the product is distributive over addition, 

f(g + h) = fg + fh. 
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and differentiation is a derivation with respect to the 
product: 

d (fg) = (df)g + f(dg). 

It turns out that one can generalize all three of 
these operations to differential forms. Adding a pair 
of forms is easy: if w and r\ are two k-forms and </> : 
[0, l] k — ■ R” is a continuously differentiable function, 
then Jø(co + g) is defined to be c o + q. One multi- 
plies forms using the so-called wedge product. If to is a 
fc-form and g is an I-form, then w a g is a (fe + I) -form. 
Roughly speaking, given a (fc + I) -dimensional infmites- 
imal parallelepiped with base point x and dimensions 
A%i a ■ ■ ■ a Axk+i, one evaluates co and g at the par- 
allelepipeds with base point % and dimensions Axi a 
■ ■ ■ a A Xk and Axjt+i a ■ ■ ■ a Axk+i, respectively, and 
multiphes the results together. 

As for differentiation, if co is a continuously differen- 
tiable k-form, then its derivative dco is a k + 1-form that 
measures something like the “rate of change” of co. To 
see what this might mean, and in particular to see why 
dco is a k + 1 form, let us think how we might answer 
a question of the following kind. We are given a spheri- 
cal surface in R 3 and a flow, and we would like to know 
the net flux out of the surface: that is, the difference 
between the amount of flux coming out and the amount 
going in. One way to do this would be to approximate 
the surface of the sphere by a union of tiny parallelo- 
grams, to measure the flux through each one, and to 
take the sum of all these fluxes. Another would be to 
approximate the solid sphere by a union of tiny paral- 
lelepipeds, to measure the net flux out of each of these, 
and to add up the results. If a parallelepiped is small 
enough, then we can closely approximate the net flux 
out of it by looking at the difference, for each pair of 
opposite faces, between the amount coming out of the 
parallelepiped through one and the amount going into 
it through the other, and this will depend on the rate 
of change of the 2 -form. 

The process of summing up the net fluxes out of the 
parallelepipeds is more rigorously described as inte- 
grating a 3 -form over the solid sphere. In this way, one 
can see that it is natural to expect that information 
about how a 2-form varies should be encapsulated in 
a 3-form. 

The exact construction of these operations requires 
a little bit of algebra and is omitted here. However, 
we remark that they obey similar laws to their scalar 
counterparts, except that there are some sign changes 
that are ultimately due to the antisymmetry (9). For 


instance, if co is a k-form and g is an 1-form, the 
commutative law for multiplication becomes 
coaij= (-1)“ijaco, 

basically because kl swaps are needed to interchange k 
dimensions with 1 dimensions; and the derivation rule 
for differentiation becomes 

d(co a g) = (dco) a g + (-l) k co a (dg). 
Another rule is that the differentiation operator d is 
nilpotent: 

d(dco) = 0. (11) 

This may seem rather unintuitive, but it is fundamen- 
tally important. To see why it might be expected, let 
us think about differentiating a 1-form twice. The orig- 
inal 1-form associates a scalar with each small line seg- 
ment. Its derivative is a 2-form that associates a scalar 
with each small parallelogram. This scalar essentially 
measures the sum of the scalars given by the 1-form 
as you go around the four edges of the parallelogram, 
though to get a sensible answer when you pass to the 
limit you have to divide by the area of the parallelo- 
gram. If we now repeat the process, we are looking at 
a sum of the six scalars associated with the six faces 
of a parallelepiped. But each of these scalars in turn 
comes from a sum of the scalars associated with the 
four directed edges around the corresponding face, and 
each edge is therefore counted twice (as it belongs to 
two faces), once in each direction. Therefore, the con- 
tributions from each edge cancel and the sum of all 
contributions is zero. 

The description given earlier of the relationship 
between integrating a 2-form over the surface of a 
sphere and integrating its derivative over the solid 
sphere can be thought of as a generalization of the fun- 
damental theorem of calculus, and can itself be gener- 
alized considerably: Stokes’s theorem is the assertion 
that 

.!>• I> ll2 » 

for any oriented manifold S and form co, where 35 is the 
oriented boundary of 5 (which we will not define here). 
Indeed one can view this theorem as a definition of the 
derivative operation co >- dco; thus, differentiation is 
the adjoint of the boundary operation. (For instance, the 
identity (11) is dual to the geometric observation that 
the boundary 35 of an oriented manifold itself has no 
boundary: 3(35) = 0.) As a particular case of Stokes’s 
theorem, we see that J s dco = 0 whenever 5 is a closed 
manifold, i.e., one with no boundary. This observation 
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lets one extend the notions of closed and exact forms 
to general differential forms, which (together with (11)) 
allows one to fully set up de Rham cohomology. 

We have already seen that O-forms can be identified 
with scalar functions. Also, in Euclidean spaces one can 
use the inner product to identify linear functionals with 
vectors, and therefore 1 -forms can be identified with 
vector helds. In the special (but very physical) case of 
three-dimensional Euclidean space R 3 , 2 -forms can also 
be identified with vector helds via the famous right- 
hand rule, 6 and 3-forms can be identihed with scalar 
functions by a variant of this rule. (This is an exam- 
ple of a concept known as Hodge duality.) In this case, 
the differentiation operation co — dco can be identihed 
with the gradient operation / — V/ when co is a 0- 
form, with the curl operation X >-. V x X when co is a 
1-form, and with the diver gence operation X •- V ■ X 
when co is a 2-form. Thus, for instance, the rule (11) 
implies that V x V/ = 0 and V ■ (V x X) for any suit- 
ably smooth scalar function / and vector held X, while 
various cases of Stokes’s theorem (12), with this inter- 
pretation, become the various theorems about integrals 
of curves and surfaces in three dimensions that you 
may have seen ref erred to as “the divergence theorem,” 
“Green’s theorem,” and “Stokes’s theorem” in a course 
on several-variable calculus. 

Just as the signed dehnite integral is connected to 
the unsigned dehnite integral in one dimension via 
(2), there is a connection between integration of dif- 
ferential forms and the Lebesgue (or Riemann) inte- 
gral. On the Euclidean space R™ one has the n stan- 
dard coordinate functions Xi,X2, . . . ,x n : R” -*■ R. 
Their derivatives dxi, . . . , dx„ are then 1-forms on R™. 
Taking their wedge product, one obtains an n-form 
dxi a ■ ■ ■ a dx n . We can multiply this with any (contin- 
uous) scalar function / : R n — ■ R to obtain another n- 
form dxi a ■ ■ ■ a dx n . If Q is any open bounded domain 
in R n , we then have the identity 

f f(x) dxi a ■ ■ ■ a dx„ = [ fix) dx, 

In In 

where on the left-hand side we have an integral of a dif- 
ferential form (with Q viewed as a positively oriented 
n-dimensional manifold) and on the right-hand side we 
have the Riemann or Lebesgue integral of f on Q. If 
we give Q the negative orientation, we have to reverse 


6. This is an entirely arbitrary convention; one could just as easily 
have used the left-hand rule to provide this Identification, and apart 
from some harmless sign changes here and there, one gets essentially 
the same theory as a consequence. 


the sign of the left-hand side. This correspondence 
generalizes (2). 

There is one last operation on forms that is worth 
pointing out. Suppose we have a continuously differen- 
tiable map : X — ■ Y from one manifold to another 
(we allow X and Y to have different dimensions). Then 
of course every point x in X pushes forward to a 
point <Hx) in Y. Similarly, if we let v e T x X be an 
infinitesimal tangent vector to X based at x, then this 
tangent vector also pushes forward to a tangent vec- 
tor 4>j,.v e T<p( X )!Y) based at #*x; informally speak- 
ing, 4> m v can be defined by requiring the infinitesi- 
mal approximation 4>(x + v) = 4>(x) + 4>*v. One can 
write #*v = D$(x)(v), where D4> : T x X - T$ (x) Y 
is the derivative of the several-variable map 4> at x. 
Finally, any k-dimensional oriented manifold S in X 
also pushes forward to a k-dimensional oriented mani- 
fold IS) in X, although in some cases (e.g., if the image 
of 4> has dimension less than k) this pushed-forward 
manifold may be degenerate. 

We have seen that integration is a duality pairing 
between manifolds and forms. Since manifolds push 
forward under 4> from X to Y, we expect forms to pull 
back from Y to X. Indeed, given any k-form co on Y, we 
can define the pull-back <P *w as the unique k-form on 
X such that we have the change-of -variables formula 

f co= f #*(co). 

J#(S) Is 

In the case of O-forms (i.e., scalar functions), the pull- 
back $*/:X^Rofa scalar function / : Y — R is given 
explicitly by <P*f(x) = /(</>(x)), while the pull-back of 
a 1-form co is given explicitly by the formula 
14>*æ) x lv) = coø( X )(<J)*u). 

Similar definitions can be given for other differen- 
tial forms. The pull-back operation enjoys several nice 
properties: for instance, it respects the wedge product, 
4>*1 co A q) = ($*co) A (4>* g ), 
and the derivative, 

d(<k*co) = $*(dco). 

By using these properties, one can recover rather 
painlessly the change-of-variables formulas in several- 
variable calculus. Moreover, the whole theory carries 
effortlessly over from Euclidean spaces to other man- 
ifolds. It is because of this that the theory of differ- 
ential forms and integration is an indispensable tool 
in the modern study of manifolds, and especially in 
DIFFERENTIAL TOPOLOGY [IV.9]. 
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III. 17 Dimension 


What is the difference between a two-dimensional set 
and a three-dimensional set? A rough answer that one 
might give is that a two-dimensional set lives inside a 
plane, while a three-dimensional set filis up a portion of 
space. Is this a good answer? For many sets it does seem 
to be: triangles, squares, and circles can be drawn in a 
plane, while tetrahedra, cubes, and spheres cannot. But 
how about the surface of a sphere? This we would nor- 
mally think of as two dimensional, contrasting it with 
the solid sphere, which is three dimensional. But the 
surface of a sphere does not live inside a plane. 

Does this mean that our rough definition was incor- 
rect? Not exactly. From the perspective of linear alge- 
bra, the set {(x,y,z) : x 2 + y 2 + z 2 = 1}, which is the 
surface of a sphere of radius 1 in R 3 centered at the 
origin, is three dimensional, precisely because it is not 
contained in a plane. (One can express this in algebraic 
language by saying that the affine subspace generated 
by the sphere is the whole of IR 3 .) However, this sense 
of “three dimensional” does not do justice to the rough 
idea that the surface of a sphere has no thickness. 
Surely there ought to be another sense of dimension 
in which the surface of a sphere is two dimensional? 

As this example illustrates, dimension, though very 
important throughout mathematics, is not a single con- 
cept. There turn out to be many natural ways of gener- 
alizing our ideas about the dimensions of simple sets 
such as squares and cubes, and they are often incom- 
patible with one another, in the sense that the dimen- 
sion of a set may vary according to which definition 
you use. The remainder of this article will set out a few 
different definitions. 

One very basic idea we have about the dimension of a 
set is that it is “the number of coordinates you need to 
specify a point.” We can use this to justify our instinct 
that the surface of a sphere is two dimensional: you 
can specify any point by giving its longitude and lati- 
tude. It is a little tricky to turn this idea into a rigorous 
mathematical definition because you can in faet spec- 
ify a point of the sphere by means of just one num- 
ber if you do not mind doing it in a highly artificial 
way. This is because you can take any two numbers 
and interleave the digits to form a single number from 
which the original two numbers can be recovered. For 
instance, from the two numbers tt = 3.141592653 . . . 
and e = 2.718281828... you can form the number 
32.174118529821685238..., and by taking alternate 


digits you get back tt and e again. It is even possible 
to find a continuous funetion / from the closed inter- 
val [0, 1] (that is, the set of all real numbers between 0 
and 1, inclusive) to the surface of a sphere that takes 
every value. 

We therefore have to decide what we mean by a “nat- 
ural” coordinate system. One way of making this deci- 
sion leads to the definition of a manifold, a very impor- 
tant concept that is discussed in [1.3 §6.9] and also in 
differential topology [IV.9], This is based on the 
idea that every point in the sphere is contained in a 
neighborhood N that “looks like” a piece of the plane, 
in the sense that there is a “nice” one-to-one correspon- 
dence <p between N and a subset of the Euclidean plane 
R 2 . Here, “nice” can have different meanings: typical 
ones are that 4> and its inverse should both be contin- 
uous, or differentiable, or inftnitely differentiable. 

Thus, the intuitive notion that a d-dimensional set 
is one where you need d numbers to specify a point 
can be developed into a rigorous definition that tells 
us, as we had hoped, that the surface of a sphere is two 
dimensional. Now let us take another intuitive notion 
and see what we can get from it. 

Suppose I want to cut a piece of paper into two pieces. 
The boundary that separates the pieces will be a curve, 
which we would normally like to think of as one dimen- 
sional. Why is it one dimensional? Well, we could use 
the same reasoning: if you cut a curve into two pieces 
then the part where the two pieces meet each other is 
a single point (or pair of points if the curve is a loop), 
which is zero dimensional. That is, there appears to be 
a sense in which a (d - 1) -dimensional set is needed if 
you want to cut a d-dimensional set into two. 

Let us try to be slightly more precise about this idea. 
Suppose that X is a set and x and y are points in X. 
Let us call a set Y a barrier between x and y if there 
is no continuous path from x to y that avoids Y. For 
example, if X is a solid sphere of radius 2, x is the 
center of X, and y is a point on the boundary of X, then 
one possible barrier between x and y is the surface of 
a sphere of radius 1. With this terminology in place, we 
can make the following inductive definition. A finite set 
is zero dimensional, and in general we say that X is at 
mostd dimensional if between any two points in X there 
is a barrier that is at most (d - 1 ) dimensional. We also 
say that X is d dimensional if it is at most d dimensional 
but not at most (d - 1) dimensional. 

The above definition makes sense, but it runs into 
difficulties: one can construct a pathological set X that 
acts as a barrier between any two points in the plane, 
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Figure 1 How to cover wlth squares 
so that no four overlap. 

but contains no segment of any curve. This makes X 
zero dimensional and therefore makes the plane one 
dimensional, which is not satisfactory. A small modifi- 
cation to the above definition eliminates such patholo- 
gies and gives a definition that was put forward by 
brouwer [VI.75]. A complete metric space [in. 58] X 
is said to have dimension at most d if, given any pair 
of disjoint closed sets A and B, you can find disjoint 
open sets U and V with A c U and B c V such that 
the complement Y of U u V (that is, everything in X 
that does not belong to either U or V) has dimension 
at most d - 1. The set Y is the barrier— the main differ- 
ence is that we have now asked for it to be closed. The 
induction starts with the empty set, which has dimen- 
sion -1. Brouwer’s definition is known as the inductive 
dimension of a set. 

Here is another basic idea that leads to a useful def- 
inition of dimension, proposed by lebesgue [VI.72]. 
Suppose you want to cover an open interval of real 
numbers (that is, an interval that does not contain its 
endpoints) with shorter open intervals. Then you will be 
forced to make the shorter ones overlap, but you can 
do it in such a way that no point is contained in more 
than two of your intervals: just start each new interval 
close to the end of the previous one. 

Now suppose that you want to cover an open square 
(that is, one that does not contain its boundary) with 
smaller open squares. Again you will be forced to make 
the smaller squares overlap, but this time the situation 
is slightly worse: some points will have to be contained 
in three squares. However, if you take squares arranged 
like bricks, as in figure 1, and expand them slightly, 
then you can do the covering in such a way that no 
four squares overlap. In general, it seems that to cover 
a typical d-dimensional set with small open sets, you 


need to have overlaps of d + 1 sets but you do not need 
to have overlaps greater than this. 

The precise definition that this leads to is surpris- 
ingly general: it makes sense not just for subsets of M n 
but even for an arbitrary topological space [IH.92]. 
We say that a set X is at most d dimensional if, how- 
ever you cover X with a finite collection of open sets 
Ui,...,U n , you can find a finite collection of open sets 
Vi, ... , Vm with the following properties: 

(i) the sets Vi also cover the whole of X; 

(ii) every Vi is a subset of at least one Uf, 

(iii) no point is contained in more than d+ 1 of the Vf 

If X is a metric space, then we can choose our 17; to have 
small diameter, thereby forring the Vi to be small. So 
this definition is basically saying that it is possible to 
cover X with open sets with no d + 2 of them overlap- 
ping, and that these open sets can be as small as you 
like. 

As with inductive dimension, we then define the 
dimension of X to be the smallest d such that X is at 
most d dimensional. And again it can be shown that 
this definition assigns the “correct” dimension to the 
familiar shapes of elementary geometry. 

A fourth intuitive idea leads to concepts known as 
homological and cohomological dimension. Associated 
with any suitable topological space X, such as a man- 
ifold, are sequences of groups known as homology 
and cohomology groups [IV. 10 §4], Here we will 
discuss homology groups, but a very similar discus- 
sion is possible for cohomology. Roughly speaking, the 
nth homology group tells you how many interestingly 
different continuous maps there are from closed n- 
dimensional manifolds M to X. If X is a manifold of 
dimension less than n, then it can be shown that the 
nth homology group is trivial: in a sense, there is not 
enough room in X to define any map that is interest- 
ingly different from a constant map. On the other hånd, 
the nth homology group of the n-sphere itself is Z, 
which says that one can classify the maps from the 
n-sphere to itself by means of an integer parameter. 

It is therefore tempting to say that a space is at least 
n dimensional if there is room inside it for interest- 
ing maps from n-dimensional manifolds. This thought 
leads to a whole class of definitions. The homological 
dimension of a structure X is defined to be the largest 
n for which some sub structure of X has a nontrivial 
nth homology group. (It is necessary to consider sub- 
structures, because homology groups can also be trivial 
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when there is too much room: it then becomes easy to 
deform a continuous map and show that it is equiva- 
lent to a constant map.) However, homology is a very 
general concept and there are many different homology 
theories, so there are many different notions of homo- 
logical dimension. Some of these are geometric, but 
there are also homology theories for algebraic struc- 
tures: for example, using suitable theories, one can 
define the homological dimension of algebraic struc- 
tures such as rings [III.83 §1] or groups [1.3 §2.1]. This 
is a very good example of geometrical ideas having an 
algebraic payoff. 

Now let us turn to a fifth and final (for this article at 
least) intuitive idea about dimension, namely the way it 
affects how we measure size. If you want to convey how 
big a shape X is, then a good way of doing so is to give 
the length of X if X is one dimensional, the area if it is 
two dimensional, and the volume if it is three dimen- 
sional. Of course, this presupposes that you already 
know what the dimension is, but, as we shall see, there 
is a way of deciding which measure is the most appro- 
priate without determining the dimension in advance. 
Then the tables are turned: we can actually define the 
dimension to be the number that corresponds to the 
best measure. 

To do this, we use the faet that length, area, and vol- 
ume scale in different ways when you expand a shape. 
If you take a curve and expand it by a factor of 2 (in all 
directions), then its length doubles. More generally, if 
you expand by a factor of C, then the length multiplies 
by C. However, if you take a two-dimensional shape and 
expand it by C, then its area multiplies by C 2 . (Roughly 
speaking, this is because each little portion of the shape 
expands by C “in two directions” so you have to mul- 
tiply the area by C twice.) And the volume of a three- 
dimensional shape multiplies by C 3 : for instance, the 
volume of a sphere of radius 3 is twenty-seven times 
the volume of a sphere of radius 1. 

It may look as though we still have to decide in 
advance whether we will talk about length, area, or vol- 
ume before we can even begin to think about how the 
measurement scales when we expand the shape. But 
this is not the case. For instance, if we expand a square 
by a factor of 2, then we obtain a new square that can 
be divided up into four congruent copies of the original 
square. So, without having decided in advance that we 
are talking about area, we can say that the size of the 
new square is four times that of the old square. 

This observation has a remarkable consequence: 
there are sets to which it is natural to assign a dimen- 


sion that is not an integer! Perhaps the simplest exam- 
ple is a famous set first defined by cantor [VI.54] and 
now known as the Cantor set. This set is produced as 
follows. You start with the closed interval [0, 1], and 
call it X(). Then you form a set X\ by removing the mid- 
dle third of Xo: that is, you remove all points between 
3 and 3 , but leave | and | themselves. So X] is the 
union of the closed intervals [0, ^ ] and [ § , 1 ] . Next, you 
remove the middle thirds of these two closed intervals 
to produce a set X2, so X2 is the union of the intervals 

[0, 9], [§, 3], [§, 9], and [|, 1]. 

In general, X n is a union of closed intervals, and X n+ \ 
is what you get by removing the middle thirds of each 
of these intervals — so X n+ \ consists of twice as many 
intervals as X n , but they are a third of the size. Once you 
have produced the sequence -Xo, X\, X2, . . . , you define 
the Cantor set to be the intersection of all the Yp that 
is, all the real numbers that remain, no matter how 
far you go with the process of removing middle thirds 
of intervals. It is not hard to show that these are pre- 
cisely the numbers whose ternary expansions consist 
just of Os and 2s. (There are some numbers that have 
two different ternary expansions. For instance, 5 can 

be written either as 0.1 or as 0.22222 In such cases 

we take the recurring expansion rather than the ter- 
minating one. So 3 belongs to the Cantor set.) Indeed, 
when you remove middle thirds for the nth time, you 
are removing all numbers that have a 1 in the nth place 
after the “decimal” (in faet, ternary) point. 

The Cantor set has many interesting properties. For 
example, it is uncountable [III. 11], but it also has mea- 
sure [III. 5 7] zero. Briefly, the first of these assertions 
follows from the faet that there is a different element 
of the Cantor set for every subset A of the natural num- 
bers (just take the ternary number 0.aia2a3 . . . , where 
at = 2 whenever ieA and cq = 0 otherwise), and there 
are uncountably many subsets of the natural numbers. 
To justify the second, note that the total length of the 
intervals making up X n is (§)" (since one removes a 
third of X n -i to produce X„). Since the Cantor set is 
contained in every X n , its measure must be smaller 
than (§) n , whatever n is, which means that it must be 
zero. Thus, the Cantor set is very large in one respect 
and very small in another. 

A further property of the Cantor set is that it is self- 
similar. The set X\ consists of two intervals, and if you 
look at just one of these intervals as the middle thirds 
are repeatedly removed, then what you see is just like 
the construction of the whole Cantor set, but scaled 
down by a factor of 3. That is, the Cantor set consists 
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of two copies of itself, each scaled down by a factor 
of 3. From this we deduce the following statement: if 
you expand the Cantor set by a factor of 3, then you can 
divide the expanded set up into two congruent copies 
of the original, so it is “twice as big.” 

What consequence should this have for the dimen- 
sion of the Cantor set? Well, if the dimension is d, then 
the expanded set ought to be 3 d times as big. There- 
fore, 3 d should equal 2. This means that d should be 
log 2/ log 3, which is roughly 0.63. 

Once one knows this, the mystery of the Cantor set 
is lessened. As we shall see in a moment, a theory of 
fractional dimension can be developed with the use- 
ful property that a countable union of sets of dimen- 
sion at most d has dimension at most d. Therefore, the 
faet that the Cantor set has dimension greater than 0 
implies that it cannot be countable (since single points 
have dimension 0). On the other hånd, because the 
dimension of the Cantor set is less than 1, it is mueh 
smaller than a one-dimensional set, so it is no surprise 
that its measure is zero. (This is a bit like saying that 
a surface has no volume, but now the two dimensions 
are 0.63 and 1 instead of 2 and 3.) 

The most useful theory of fractional dimension is 
one developed by hausdorff [VI.68]. One begins with 
a concept known as Hausdorff measure, which is a nat- 
ural way of assessing the “d-dimensional volume” of a 
set, even if d is not an integer. Suppose you have a curve 
in R 3 and you want to work out its length by consider- 
ing how easy it is to cover it with spheres. A first idea 
might be to say that the length was the smallest you 
could make the sum of the diameters of the spheres. 
But this does not work: you might be lucky and find that 
a long curve was tightly wrapped up, in which case you 
could cover it with a single sphere of small diameter. 

However, this would no longer be possible if your 
spheres were required to be small. Suppose, therefore, 
that we require all the diameters of the spheres to be 
at most 5. Let L(6) be the smallest we can then get the 
sum of the diameters to be. The smaller 6 is, the less 
flexibility we have, so the larger L(5) willbe. Therefore, 
L(6) tends to a (possibly infmite) limit L as S tends to 0, 
and we call L the length of the curve. 

Now suppose that we have a smooth surface in R 3 
and want to deduce its area from information about 
covering it with spheres. This time, the area that you 
can cover with a very small sphere (so small that it 
meets only one portion of the surface and that por- 
tion is almost flat) will be roughly proportional to the 
square of the diameter of the sphere. But that is the only 


detail we need to change: let A(<5) be the smallest we 
can make the sum of the squares of the diameters of a 
set of spheres that cover the surface, if all those spheres 
have diameter at most S. Then declare the area of the 
surface to be the limit of A(5) as 5 tends to 0. (Strictly 
speaking, we ought to multiply this limit by tt/4, but 
then we get a definition that does not generalize easily.) 

We have just given a way of dehning length and area, 
for shapes in R 3 . The only difference between the two 
was that for length we considered the sum of the diam- 
eters of small spheres, while for area we considered the 
sum of the squares of the diameters of small spheres. 
In general, we dehne the d-dimensional Hausdorff mea- 
sure in a similar way, but considering the sum of the 
dth powers of the diameters. 

We can use the concept of Hausdorff measure to give 
a rigorous definition of fractional dimension. It is not 
hard to show that for any shape X there will be exaetly 
one appropriate d, in the following sense: if c is less 
than d, then the c-dimensional Hausdorff measure of 
X is 0, while if c is greater than d, then it is infmite. 
(For instance, the c-dimensional Hausdorff measure of 
a smooth surface is 0 if c < 2 and infmite if c > 2.) This 
d is called the Hausdorff dimension of the set X. Haus- 
dorff dimension is very useful for analyzing fractal sets, 
which are discussed further in Dynamics [TV. 15]. 

It is important to realize that the Hausdorff dimen- 
sion of a set need not equal its topological dimension. 
For example, the Cantor set has topological dimen- 
sion zero and Hausdorff dimension log 2 / log 3 . A larger 
example is a very wiggly curve known as the Koch 
snowflake. Because it is a curve (and a single point is 
enough to cut it into two) it has topological dimen- 
sion 1. However, because it is very wiggly, it has infi- 
nite length, and its Hausdorff dimension is in faet 
log 4/ log 3. 


III. 18 Distributions 

Terence Tao 


A funetion is normally dehned to be an object / : X — ■ Y 
which assigns to each point x in a set X, known as the 
domain, a point /(x) in another set Y, known as the 
range (see the language and grammar of mathe- 
matics [1.2 §2.2]). Thus, the definition of funetions is 
set-theoretic and the fundamental operation that one 
can perform on a funetion is evaluation: given an ele- 
ment x of X, one evaluates f at x to obtain the element 
/(x) of Y. 
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However, there are some fields of mathematics where 
this may not be the best way of describing functions. 
In geometry, for instance, the fundamental property of 
a function is not necessarily how it acts on points, but 
rather how it pushes forward or pulis back objects that 
are more complicated than points (e.g., other functions, 
bundles [IV.10 §5] and sections, schemes [IV.6 §3] and 
sheaves, etc.)- Similarly, in analysis, a function need not 
necessarily be defined by what it does to points, but 
may instead be defined by what it does to objects of 
different kinds, such as sets or other functions; the for- 
mer leads to the notion of a measure, the latter to that 
of a distribution. 

Of course, all these notions of function and function- 
like objects are related. In analysis, it is helpful to think 
of the various notions of a function as forming a Spec- 
trum, with very “smooth” classes of functions at one 
end and very “rough” ones at the other. The smooth 
classes of functions are very restrictive in their mem- 
bership: this means that they have good properties, and 
there are many operations that one can perform on 
them (such as, for example, differentiation), but it also 
means that one cannot necessarily ensure that the func- 
tions one is working with belong to this category. Con- 
versely, the rough classes of functions are very general 
and inclusive: it is easy to ensure that one is working 
with them, but the price one pays is that the number of 
operations one can perform on these functions is often 
sharply reduced (see function spaces [III. 2 9]). 

Nevertheless, the various classes of functions can 
often be treated in a unified manner, because it is 
often possible to approximate rough functions arbitrar- 
ily well (in an appropriate topology [III.92]) by smooth 
ones. Then, given an operation that is naturally defined 
for smooth functions, there is a good chance that there 
will be exactly one natural way to extend it to an opera- 
tion on rough functions: one takes a sequence of better 
and better smooth approximations to the rough func- 
tions, performs the operation on them, and passes to 
the limit. 

Distributions, or generalized functions, belong at the 
rough end of the Spectrum, but before we say what 
they are, it will be helpful to begin by considering some 
smoother classes of functions, partly for comparison 
and partly because one obtains rough classes of func- 
tions from smooth ones by a process known as duality. 
a linear functional defined on a space E of functions 
is simply a linear map 4> from E to the scalars R or C. 
Typically, E is a normed space, or at least comes with a 


topology, and the dual space is the space of continuous 
linear functionals. 

The class C'"[-l, 1] of analytic functions. These are in 
many ways the “nicest” functions of all, and include 
many familiar functions such as exp(x), sin(x), poly- 
nomials, and so on. However, we shall not discuss them 
further, because for many purposes they form too rigid 
a class to be useful. (For example, if an analytic func- 
tion is zero everywhere on an interval, then it is forced 
to be zero everywhere.) 

The class C“[-l, 1] of test functions. These are the 
smooth (that is, infinitely differentiable) functions /, 
defined on the interval [-1,1], that vanish on neighbor- 
hoods of 1 and - 1. (That is, one can find 5 > 0 such that 
f(x) = 0 whenever x>l-5orx<-l + (5.) They are 
more numerous than analytic functions and therefore 
more tractable for analysis. For instance, it is often use- 
ful to construct smooth “cutoff functions,” which are 
functions that vanish outside some small set but do not 
vanish inside it. Also, all the operations from calculus 
(differentiation, integration, composition, convolution, 
evaluation, etc.) are available for these functions. 

The class C°[-l, 1] of continuous functions. These func- 
tions are regular enough for the notion of evaluation, 
x • * /(x), to make sense for every x e [-1, 1], and 
one can integrate such functions and perform algebraic 
operations such as multiplication and composition, but 
they are not regular enough that operations such as dif- 
ferentiation can be performed on them. Still, they are 
usually considered among the smoother examples of 
functions in analysis. 

The class L 2 [-l, 1] of square-integrable functions. These 
are measurable functions / : [-1,1] — M for which 
the Lebesgue integral |/(x)| 2 dx is finite. Usually 
one regards two such functions / and g as equal if 
the set of x such that /(x) f g(x) has measure zero. 
(Thus, from the set-theoretic point of view, the object 
in question is really an equivalence class [1.2 §2.3] 
of functions.) Since a singleton {x} has measure zero, 
we can change the value of /(x) without changing the 
function. Thus, the notion of evaluation does not make 
sense for a square-integrable function /(x) at any spe- 
cific point x. However, two functions that differ on a 
set of measure zero have the same lebesgue integral 
[III. 5 7], so integration does make sense. 

A key point about this class is that it is self-dual 
in the following sense. Any two functions in this 
class can be paired together by the inner product 
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( f,g ) = S\f(x)g(x) dx. Therefore, given a function 
g g I 2 [-l, 1], themap/ -* (f,g) defines a linear func- 
tional on L 2 [-l, 1], which turns out to be continuous. 
Moreover, given any continuous linear functional <p on 
L 2 [- 1, 1 J, there is a unique function g e L 2 [ - 1 , 1] such 
that </>(/) = (f,g) for every /. This is a special case of 
one of the Riesz representation theorems. 

The class C°[-l, 1]* of finite Borel measures. Any finite 
Borel measure [III. 5 7] p gives rise to a continuous lin- 
ear functional on C°[-l, 1] defined by / - (p,f) = 
S-if(x) d p. Another of the Riesz representation theo- 
rems says that every continuous linear functional on 
C°[-l, 1] arises in this way, so one could in principle 
define a finite Borel measure to be a continuous linear 
functional on C°[-l, 1]. 

The class C 00 ([-1,1])* of distributions. Just as mea- 
sures can be viewed as continuous linear functionals 
on C°([-l, 1]), a distribution p is a continuous linear 
functional on C“([-l, 1]) (with an appropriate topol- 
ogy). Thus, a distribution can be viewed as a “Virtual 
function”: it cannot itself be directly evaluated, or even 
integrated over an open set, but it can still be paired 
with any test function g g Q° ( [ - 1 , 1 J ) , producing a 
number (g, g). A famous example is the Dirac distribu- 
tion 6o, defined as the functional which, when paired 
with any test function g, returns the evaluation g( 0) 
of g at zero: {6o,g) = g( 0). Similarly, we have the 
derivative of the Dirac distribution, -<5 q, which, when 
paired with any test function g, returns the derivative 
g'(0) of g at zero: (~6' 0 ,g) = g'(0). (The reason for 
the minus sign will be given later.) Since test functions 
have so many operations available to them, there are 
many ways to define continuous linear functionals, so 
the class of distributions is quite large. Despite this, 
and despite the indirect, Virtual nature of distributions, 
one can still define many operations on them; we shall 
discuss this later. 

The class C“([-l, 1])* of hyperfunctions. There are 
classes of functions more general still than distribu- 
tions. For instance, there are hyperfunctions, which 
roughly speaking one can think of as linear function- 
als that can be tested only against analytic functions 
g e C" ([-1,1]) rather than against test functions 
g g C°°([— 1,1]). However, as the class of analytic func- 
tions is so sparse, hyperfunctions tend not to be as 
useful in analysis as distributions. 

At first glance, the concept of a distribution has lim- 
ited Utility, since all a distribution p is empowered to do 


is to be tested against test functions g to produce inner 
Products (g, g). However, using this inner product, one 
can often take operations that are initially defined only 
on test functions, and extend them to distributions by 
duality. A typical example is differentiation. Suppose 
one wants to know how to define the derivative g 1 of 
a distribution, or in other words how to define (g' ,g) 
for any test function g and distribution g. If g is itself 
a test function g = f, then we can evaluate this using 
integration by parts (recalling that test functions vanish 
at -1 and 1). We have 

(flg) = | i f'(x)g{x) dx 

= - 1 i f(x)g'(x) dx = ~{f,g'). 

Note that if g is a test function, then so is g'. We can 
therefore generalize this formula to arbitrary distribu- 
tions by defining (g’, g) = -(g, g'). This is the justifi- 
cation for the differentiation of the Dirac distribution: 
(6 , 0 ,g) = -{5 0 ,g') = -g'(0). 

More formally, what we have done here is to com- 
pute the adjoint of the differentiation operation (as 
defined on the dense space of test functions). Then 
we have taken adjoints again to define the differenti- 
ation operation for general distributions. This proce- 
dure is well-defined and also works for many other con- 
cepts; for instance, one can add two distributions, mul- 
tiply a distribution by a smooth function, convolve two 
distributions, and compose distributions on both left 
and right with suitably smooth functions. One can even 
take Fourier transforms of distributions. For instance, 
the Fourier transform of the Dirac delta <5o is the con- 
stant function 1, and vice versa (this is essentially 
the Fourier inversion formula), while the distribution 
XnGz^o(x - n) is its own Fourier transform (this is 
essentially the Poisson summation formula). Thus the 
space of distributions is quite a good space to work in, 
in that it contains a large class of functions (e.g., all 
measures and integrable functions), and is also closed 
under a large number of common operations in analy- 
sis. Because the test functions are dense in the space 
of distributions, the operations as defined on distribu- 
tions are usually compatible with those on test func- 
tions. For instance, if / and g are test functions and 
/' = g in the sense of distributions, then f'=g will 
also be true in the classical sense. This often allows 
one to manipulate distributions as if they were test 
functions without fear of confusion or inaccuracy. The 
main operations one has to be careful about are evalua- 
tion and pointwise multiplication of distributions, both 
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of which are usually not well-defined (e.g., the square 
of the Dirac delta distribution is not well-defined as a 
distribution). 

Another way to view distributions is as the weak limit 
of test functions. A sequence of functions f n is said to 
converge weakly to a distribution g if {f n ,g) — ■ {g, g) 
for all test functions g. For instance, if <p is a test func- 
tion with total integral j\qp = 1, then the test func- 
tions f n (x) = nqp(nx) can be shown to converge 
weakly to the Dirac delta distribution 5o, while the func- 
tions fn = n 2 qp' ( nx ) converge weakly to the derivative 
<5 o of the Dirac delta. On the other hånd, the functions 
g n M = cos(nx)qp(x) converge weakly to zero (this is 
a variant of the Riemann-Lebesgue lemma). Thus weak 
convergence has some unusual features not present in 
stronger notions of convergence, in that severe oscil- 
lations can sometimes “disappear” in the limit. One 
advantage of working with distributions instead of 
smoother functions is that one often has some com- 
pactness in the space of distributions under weak lim- 
its (e.g., by the Banach-Alaoglu theorem). Thus, distri- 
butions can be thought of as asymptotic extremes of 
behavior of smoother functions, just as real numbers 
can be thought of as limits of rational numbers. 

Because distributions can be easily differentiated, 
while still being closely connected to smoother func- 
tions, they have been extremely useful in the study of 
partial differential equations (PDEs), particularly when 
the equations are linear. For instance, the general solu- 
tion to a linear PDE can often be described in terms 
of its fundamental solution, which solves the PDE in 
the sense of distributions. More generally, distribution 
theory (together with related concepts, such as that of 
a weak derivative) gives an important (though certainly 
not the only) means to define generalized solutions of 
both linear and nonlinear PDEs. As the name suggests, 
these generalize the concept of smooth (or classical ) 
solutions by allowing the formation of singularities, 
shocks, and other nonsmooth behavior. In some cases 
the easiest way to construct a smooth solution to a PDE 
is first to construct a generalized solution and then to 
use additional arguments to show that the generalized 
solution is in faet smooth. 


III. 19 Duality 


Duality is an important general theme that has manifes- 
tations in almost every area of mathematics. Over and 
over again, it turns out that one can associate with a 
given mathematical object a related, “dual” object that 


helps one to understand the properties of the object 
one started with. Despite the importance of duality in 
mathematics, there is no single definition that covers 
all instances of the phenomenon. So let us look at a 
few examples and at some of the characteristic features 
that they exhibit. 

1 Platonic Solids 

Suppose you take a cube, draw points at the centers of 
each of its six faces, and let those points be the ver- 
tices of a new polyhedron. The polyhedron you get will 
be a regular octagon. What happens if you repeat the 
process? If you draw a point at the center of each of 
the eight faces of the octahedron, you will find that 
these points are the eight vertices of a cube. We say that 
the cube and the octahedron are dual to one another. 
The same can be done for the other Platonic solids: 
the dodecahedron and the icosahedron are dual to one 
another, while the dual of a tetrahedron is again a 
tetrahedron. 

The duality just described does more than just split 
up the five Platonic solids into three groups: it allows us 
to associate statements about a solid with statements 
about its dual. For instance, two faces of a dodecahe- 
dron are adjacent if they share an edge, and this is so 
if and only if the corresponding vertices of the dual 
icosahedron are linked by an edge. And for this reason 
there is also a correspondence between edges of the 
dodecahedron and edges of the icosahedron. 

2 Points and Lines in the Projective Plane 

There are several equivalent definitions of the projec- 
tive plane [1.3 §6.7]. One, which we shall use here, is 
that it is the set of all lines in R 3 that go through the 
origin. These lines we call the “points” of the projec- 
tive plane. In order to visualize this set as a geometri- 
cal object and to make its “points” more point-like, it 
is helpful to associate each line through the origin with 
the pair of points in R 3 at which it intersects the unit 
sphere: indeed, one can define the projective plane as 
the unit sphere with opposite points identified. 

A typical “line” in the projective plane is the set of 
all “points” (that is, lines through the origin) that lie in 
some plane through the origin. This is associated with 
the great circle in which that plane intersects the unit 
sphere, once again with opposite points identified. 

There is a natural association between lines and 
points in the projective plane: each point P is associated 
with the line L that consists of all points orthogonal to 
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P, and each line L is associated with the single point P 
that is orthogonal to all points in L. For example, if P is 
the z-axis, then the associated projective line L is the set 
of all lines through the origin that lie in the xy-plane, 
and vice versa. This association has the following basic 
property: if a point P belongs to a line L, then the line 
associated with P contains the point associated with L. 

This allows us to translate statements about points 
and lines into logically equivalent statements about 
lines and points. For example, three points are collinear 
(that is, they all lie in a line) if and only if the corre- 
sponding lines are concurrent (that is, there is some 
point that is contained in all of them). In general, once 
you have proved a theorem in projective geometry, you 
get another, dual, theorem for free (unless the dual 
theorem turns out to be the same as the original one). 

3 Sets and Their Complements 

Let X be a set. If A is any subset of X, then the com- 
plement of A, written A c , is the set of all elements of X 
that do not belong to A. The complement of the com- 
plement of A is clearly A, so there is a kind of dual- 
ity between sets and their complements. De Morgan's 
laws are the statements that (An B) c = A c u B c and 
(Au B) c = A c n B c -, they tell us that complementation 
“turns intersections into unions,” and vice versa. Notice 
that if we apply the first law to A c and B c , then we find 
that (A c n B c ) c = Au B. Taking complements of both 
sides of this equality gives us the second law. 

Because of de Morgan’s laws, any identity involv- 
ing unions and intersections remains true when you 
interchange them. For example, one useful identity is 
Au (B n C) = (AuB)n(AuC). Applying this to the 
complements of the sets and using de Morgan’s laws, it 
is straightforward to deduce the equally useful identity 
A n (B u C) = (A n B) u (A n C). 

4 Dual Vector Spaces 

Let V be a vector space [1.3 §2.3], over R, say. The dual 
space V* is defined to be the set of all linear functionals 
on V: that is, linear maps from V to R. It is not hard to 
define appropriate notions of addition and scalar mul- 
tiplication and show that these make V* into a vector 
space as well. 

Suppose that T is a linear map [1.3 §4.2] from a vec- 
tor space V to a vector space W. If we are given an ele- 
ment tu* of the dual space W*, then we can use T and 
tv* to create an element of V* as follows: it is the map 
that takes v to the real number w*(Tv). This map, 


which is denoted by T*w* , is easily checked to be lin- 
ear. The function T* is itself a linear map, called the 
adjoint of T, and it takes elements of W* to elements 
of V*. 

This is a typical feature of duality: a function / from 
object A to object B very often gives rise to a function 
g from the dual of B to the dual of A. 

Suppose that T* is a surjection. Then if v ^ v' , 
we can find v* such that v* (v) £ v*(v'), and then 
tv* e W* such that T*w* = v*, so that T* tv*(v) £ 
T*iv*(v'), and hence w*(Tv) £ i v*(Tv'). This 
implies that Tv Tv' , which proves that T is an injec- 
tion. We can also prove that if T* is an injection, then 
T is a surjection. Indeed, if T is not a surjection, then 
TV is a proper subspace of W, which allows us to find 
a nonzero linear functional tv* such that tv* (Tv) = 0 
for every veV, and hence such that T*tv* =0, which 
contradicts the injectivity of T*. If V and W are finite 
dimensional, then (T*)* = T, so in this case we find 
that T is an injection if and only if T* is a surjection, 
and vice versa. Therefore, we can use duality to con- 
vert an existence problem into a uniqueness problem. 
This conversion of one kind of problem into a different 
kind is another characteristic and very useful feature 
of duality. 

If a vector space has additional structure, the defini- 
tion of the dual space may well change. For instance, if 
X is a real banach space [III.64], then X* is defined 
to be the space of all continuous linear functionals 
from X to R, rather than the space of all linear func- 
tionals. This space is also a Banach space: the norm 
of a continuous linear functional / is defined to be 
sup{|/(x)| : x e X, ||x|| < 1}. If X is an explicit exam- 
ple of a Banach space (such as one of the spaces dis- 
cussed in function spaces [III.29]), it canbe extremely 
useful to have an explicit description of the dual space. 
That is, one would like to find an explicitly described 
Banach space Y and a way of associating with each 
nonzero element y of Y a nonzero continuous linear 
functional (f> y defined on X, in such a way that every 
continuous linear functional is equal to (f> y for some 
y e Y. 

From this perspective, it is more natural to regard X 
and Y as having the same status. We can reflect this in 
our notation by writing (x,y) instead of 4>y(x). If we 
do this, then we are drawing attention to the faet that 
the map which takes the pair (x,y) to the real 
number (x,y), is a continuous bilinear map from XxY 
to R. 
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More generally, whenever we have two mathematical 
objects A and B, a set S of “scalars” of some kind, and 
a function f : A x B -> S that is a structure-preserving 
map in each variable separately, we can think of the 
elements of A as elements of the dual of B, and vice 
versa. Functions like (8 are called pairings. 

5 Polar Bodies 

Let X be a subset of IR’ 1 and let ( ■ , ■ ) be the standard 
inner product [III. 3 7] on R n . Then the polar of X, 
denoted X°, is the set of all points yel" such that 
{x, y ) ^ 1 for every xeXIt is not hard to check that 
X° is closed and convex, and that if X is closed and 
convex, then (X°)° = X. Furthermore, if n = 3 and X 
is a Platonic solid centered at the origin, then X° is (a 
multiple of) the dual Platonic solid, and if X is the “unit 
ball” of a normed space (that is, the set of all points of 
norm at most 1), then X° is (easily identified with) the 
unit ball of the dual space. 

6 Duals of Abelian Groups 

If G is an Abelian group, then a character on G is a 
homomorphism from G to the group T of all complex 
numbers of modulus 1. Two characters can be multi- 
plied together in an obvious way, and this multiplica- 
tion makes the set of all characters on G into another 
Abelian group, called the dual group, G, of the group G. 
Again, if G has a topological structure, then one usually 
imposes an additional continuity condition. 

An important example is when the group is itself T. 
It is not hard to show that the continuous homomor- 
phisms from T to T all have the form e'° >- e mø for 
some integer n (which may be negative or zero). Thus, 
the dual of T is (isomorphic to) Z. 

This form of duality between groups is called Pon- 
tryagin duality. Note that there is an easily defined pair- 
ing between G and G: given an element g £ G and a 
character g G, we define {g, ip) to be ip(g). 

Under suitable conditions, this pairing extends to 
functions defined on G and G. For instance, if G and 
G are fmite, and / : G — C and F : G — C, 
then we can define (/, F) to be the complex number 
I GI -1 Z, £G Z, eé /(a)F(d-). In general, one obtains a 
pairing between a complex hilbert space [III. 3 7] of 
functions on G and a Hilbert space of functions on G. 

This extended pairing leads to another important 
duality. Given a function in the Hilbert space I 2 (T), 
its Fourier transform is the function / e £2 (Z) that 


is defined by the formula 
1 r 2n 

f(n) = — f(é 0 )e~ inø dø. 

2tt Jo 

The Fourier transform, which can be defined similarly 
for functions on other Abelian groups, is immensely 
useful in many areas of mathematics. (See, for exam- 
ple, FOURIER TRANSFORMS [III. 2 7] and REPRESENTATION 
theory [IV. 12].) By contrast with some of the previ- 
ous examples, it is not always easy to translate a state- 
ment about a function / into an equivalent statement 
about its Fourier transform /, but this is what gives 
the Fourier transform its power: if you wish to under- 
stand a function / defined on T, then you can explore 
its properties by looking at both / and f. Some proper- 
ties will follow from facts that are naturally expressed 
in terms of / and others from facts that are naturally 
expressed in terms of /. Thus, the Fourier transform 
“doubles one’s mathematical power.” 

7 Homology and Cohomology 

Let X be a compact n-dimensional manifold [1.3 §6.9]. 
If M and M' are an i-dimensional submanifold and 
an (n - i) -dimensional submanifold of X, respectively, 
and if they are well-behaved and in sufficiently gen- 
eral position, then they will intersect in a finite set of 
points. If one assigns either 1 or -1 to each of these 
points in a natural way that takes account of how M 
and Af intersect, then the sum of the numbers at the 
points is an invariant called the intersection number of 
M and Af'. This number turns out to depend only on 
the homology classes [IV. 10 §4] of M and M'. Thus, 
it defines a map from Hi(X) x H n -i(X) to Z, where we 
write H r (V) for the rth homology group of X. This map 
is a group homomorphism in each variable separately, 
and the resulting pairing leads to a notion of duality 
called Poincaré duality, and ultimately to the modern 
theory of cohomology, which is dual to homology. As 
with some of our other examples, many concepts asso- 
ciated with homology have dual concepts: for exam- 
ple, in homology one has a boundary map, whereas in 
cohomology there is a coboundary map (in the opposite 
direction). Another example is that a continuous map 
from X to Y gives rise to a homomorphism from the 
homology group Hi(X) to the homology group H;(Y), 
and also to a homomorphism from the cohomology 
group H l (Y) to the cohomology group H i (X). 
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8 Further Examples Discussed in This Book 

The examples above are not even close to a complete 
list: even in this book there are several more. For 
instance, the article on differential forms [III. 16] dis- 
cusses a pairing, and hence a duality, between k-forms 
and fc-dimensional surfaces. (The pairing is given by 
integrating the form over the surface.) The article on 
distributions [III. 18] shows how to use duality to give 
rigorous definitions of function-like objects such as the 
Dirac delta function. The article on mirror symme- 
try [IV. 14] discusses an astonishing (and still largely 
conjectural) duality between calabi-yau manifolds 
[III.6] and so-called “mirror manifolds.” Often the mir- 
ror manifold is much easier to understand than the 
original manifold, so this duality, like the Fourier trans- 
form, makes certain calculations possible that would 
otherwise be unthinkable. And the article on repre- 
sentation theory [IV.12] discusses the “Langlands 
dual” of certain (non-Abelian) groups: a proper under- 
standing of this duality would solve many major open 
problems. 


III.20 Dynamical Systems and Chaos 


From a scientific point of view, a dynamical system is a 
physical system, such as a collection of planets or the 
water in a canal, that changes over time. Typically, the 
positions and velocities of the parts of such a system 
at a time t depend only on the positions and veloci- 
ties of those parts just before that time, which means 
that the behavior of the system is governed by a system 
of PARTIAL DIFFERENTIAL EQUATIONS [1.3 §5.3]. Often, 
a very simple collection of partial differential equations 
can lead to very complicated behavior of the physical 
system. 

From a mathematical point of view, a dynamical sys- 
tem is any mathematical object that evolves in time 
according to a precise rule that determines the behavior 
of the system at time t from its behavior just before- 
hand. Sometimes, as above, “just beforehand” refers to 
a time infinitesimally earlier, which is why calculus is 
involved. But there is also a vigorous theory of discrete 
dynamical systems, where the “time” t takes integer 
values, and the “time just before t” is t - 1. If / is the 
function that tells us how the system at time t depends 
on the system at time ti, then the system as a whole 
can be thought of as the process of iterating f: that is, 
applying / over and over again. 


As with continuous dynamical systems, a very simple 
function / can lead to very complicated behavior if you 
iterate it enough times. In particular, some of the most 
interesting dynamical systems, both discrete ones and 
continuous ones, exhibit an extreme sensitivity to ini- 
tial conditions, which is known as chaos. This is true, 
for example, of the equations that govern weather. One 
cannot hope to specify exactly the wind speed at every 
point on the Earth’s surface (not to mention high above 
it), which means that one has to make do with approx- 
imations. Because the relevant equations are chaotic, 
the resulting inaccuracies, which may be small to start 
with, rapidly propagate and overwhelm the system: you 
could start with a different, equally good approxima- 
tion and Und that after a fairly short time the system 
had evolved in a completely different way. This is why 
accurate forecasting is impossible more than a few days 
in advance. 

For more about dynamical systems and chaos, see 
DYNAMICS [IV.15]. 


III.21 Elliptic Curves 

Jordan S. Ellenberg 


An elliptic curve over a held K can be defined as an 
algebraic curve of genus 1 over K, endowed with a point 
defined over K. If this definition is too abstract for your 
tastes, then an equivalent definition is the following: an 
elliptic curve is a curve in the plane determined by an 
equation of the form 

y 2 + aixy + a 3 y = x 3 + a 2 x 2 + a 4 x + a 6 . (1) 

When the characteristic of K is not 2, we can trans- 
form this equation into the simpler form y 2 = f(x), 
for some cubic polynomial /. In this sense, an ellip- 
tic curve is a rather concrete object. However, this def- 
inition has given rise to a subject of seemingly inex- 
haustible mathematical interest, which has provided a 
tremendous fund of ideas, examples, and problems in 
number theory and algebraic geometry. This is in part 
because there are many values of “X” for which it is the 
case that “the simplest interesting example of X is an 
elliptic curve.” 

For instance, the points of an elliptic curve E with 
coordinates in K naturally form an Abelian group, 
which we call E(K). The connected projective vari- 
eties [III. 9 7] that admit a group law of this kind are 
called Abelian varieties ; and elliptic curves are just 
the Abelian varieties that are one dimensional. The 
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Mordell-Weil theorem tells us that, when K is a num- 
ber held and A is an Abelian variety, A(K) is actually a 
finitely generated Abeban group, called a Mordell-Weil 
group ; these Abelian groups are much studied but have 
retained much of their mystery (see rational points 

ON CURVES AND THE MORDELL CONJECTURE [V.31]). 

Even when A is an elliptic curve, in which case we would 
call it E instead, there is a great deal that we do not 
know, though the birch-swinnerton-dyer conjec- 
ture [V.4] offers a conjectural formula for the rank of 
the group E(K ) . For much more on the topic of rational 
points on elliptic curves, see arithmetic geometry 
[IV.6]. 

Since E(K) forms an Abelian group, given any prime 
p one can look at the subgroup of elements P such 
that pP = 0. This subgroup is called E(K)[p], In par- 
ticular, we can take the algebraic closure K of K and 
look at E(K)[p]. It turns out that, when K is a num- 
ber field [III.65] (or, for that matter, any held of char- 
acteristic not equal to p), this group is isomorphic to 
(z/pz) 2 , no matter what choice of E we started with. 

If the group is the same for all elliptic curves, why is it 
interesting? Because it turns out that the galois group 
[V.24] Gal(K/K) permutes the set E(K)[p\. In faet, the 
action of Gal(.K7£) on the group (Z/pZ) 2 gives rise to 
a representation [III. 79] of the Galois group. This is 
a foundational example in the theory of Galois repre- 
sentations , which has become central to contemporary 
number theory. Indeed, the proof of fermat’s last 
theorem [V.12] by Andrew Wiles is in the end a the- 
orem about the Galois representations that arise from 
elliptic curves. And what Wiles proved about these spe- 
cial Galois representations is itself a small special case 
of the family of conjectures known as the Langlands 
program, which proposes a thoroughgoing correspon- 
dence between Galois representations and automorphic 
forms, which are generalized versions of the classical 
analytic funetions called modular forms [III.61]. 

In another direction, if £ is an elliptic curve over 
C, then the set of points of E with complex coordin- 
ates, which we denote E( C), is a complex manifold 
[III.90 §3]. It turns out that this manifold can always be 
expressed as the quotient of the complex plane by a cer- 
tain group A of transformations. What is more, these 
transformations are just translations: each map sends 
z to z + c for some complex number c. (This expres- 
sion of £(C) as a quotient is carried out with the help 
of elliptic functions [V.34].) Each elliptic curve gives 
rise in this way to a subset — indeed, a subgroup — of 
the complex numbers; the elements of this subgroup 
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are called periods of the elliptic curve. This construc- 
tion can be regarded as the very beginning of Hodge 
theory, a powerful branch of algebraic geometry with 
a reputation for extreme difficulty. (The Hodge conjec- 
ture, a central question in the theory, is one of the Clay 
Institute’s million-dollar-prize problems.) 

Yet another point of view is presented by the mod- 
uli space [IV.8] of elliptic curves, denoted Mi,i. This is 
itself a curve, but not an elliptic one. (In faet, if I am 
completely honest, I should say that Mi j is not quite a 
curve at all — it is an object called, depending on whom 
you ask, an orbifold [IV. 7 §7] or an algebraic stack— 
you can think of it as a curve from which someone has 
removed a few points, folded the points in half or into 
thirds, and then glued the folded-up points back in. 
You might find it reassuring to know that even pro- 
fessionals in the subject find this process rather diffi- 
cult to visualize.) The curve Mi i is a “simplest exam- 
ple” in two ways: it is the simplest modular curve, and 
simultaneously the simplest moduli space of curves. 


III.22 The Euclidean Algorithm and 
Continued Fractions 

Keith Ball 


1 The Euclidean Algorithm 

THE FUNDAMENTAL THEOREM OF ARITHMETIC [V.16], 

which States that every integer can be factored into 
primes in a unique way, has been known since antiq- 
uity. The usual proof depends upon what is known as 
the Euclidean algorithm, which constructs the highest 
common factor (h, say) of two numbers m and n. In 
doing so, it shows that h can be written in the form 
am + bn for some pair of integers a, b (not necessarily 
positive). For example, the highest common factor of 
17 and 7 is 1, and sure enough we can express 1 as the 
combination 1 = 5x17-12x7. 

The algorithm works as follows. Assume that m is 
larger than n and start by dividing m by n to yield a 
quotient qi and a remainder r\ that is less than n. Then 

m = qin + n. (1) 

Now since r\ < n we may divide n by ri to obtain a 
second quotient and remainder: 

n = q 2 n+r 2 . (2) 

Continue in this way, dividing n by r 2 , r 2 by r-$, and so 
on. The remainders get smaller each time but cannot 
go below zero. So the process must stop at some point 
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with a remainder of 0: that is, with a division that comes 
out exactly. For instance, if m = 165 and n = 70, the 
algorithm generates the sequence of divisions 


165 = 2 x 70 + 25, (3) 

70 = 2 X 25 + 20, (4) 

25 = 1 x 20+ 5, (5) 

20 = 4 x 5 + 0. (6) 


The process guarantees that the last nonzero remain- 
der, 5 in this case, is the highest common factor of m 
and n. On the one hånd, the last line shows that 5 is a 
factor of the previous remainder 20. Now the last-but- 
one line shows that 5 is also a factor of the remain- 
der 25 that occurred one step earlier, because 25 is 
expressed as a combination of 20 and 5. Working back 
up the algorithm we conclude that 5 is a factor of both 
m = 165 and n = 70. So 5 is certainly a common factor 

On the other hånd, the last-but-one line shows that 
5 can be written as a combination of 25 and 20 with 
integer coefficients. Since the previous line shows that 
20 canbe written as a combination of 70 and 25 we can 
write 5 in terms of 70 and 25: 

5 = 25 - 20 = 25 - (70 - 2 x 25) = 3 x 25 - 70. 
Continuing back up the algorithm we can express 25 in 
terms of 165 and 70 and conclude that 

5 = 3 x (165 - 2 x 70) - 70 = 3 x 165 - 7 x 70. 

This shows that 5 is the highest common factor 
of 165 and 70 because any factor of 165 and 70 would 
automatically be a factor of 3 x 165 - 7 x 70: that is, a 
factor of 5. Along the way we have shown that the high- 
est common factor can be expressed as a combination 
of the two original numbers m and n. 

2 Continued Fractions for Numbers 

During the 1500 years following Euclid, it was realized 
by mathematicians of the Indian and Arabic schools 
that the application of the Euclidean algorithm to a pair 
of integers m and n could be encoded in a formula for 
the ratio m/n. The equation (1) can be written 



where F = n/r\. Now equation (2) expresses F as 


The next step of the algorithm will produce an expres- 
sion for r\ /+2 and so on. If the algorithm stops after 


k steps, then we can put these expressions together to 
get what is called the continued fraction for m/n: 



For example, 

165 „ , 1 

70 2 + ITT 

The continued fraction can be constructed directly 
from the ratio 165/70 = 2.35714... without refer- 
ence to the integers 165 and 70. We start by subtract- 
ing from 2.35714 ... the largest whole number we can: 
namely 2. Now we take the reciprocal of what is left: 
1/0.35714 . . . = 2.8. Again we subtract off the largest 
integer we can, 2, which tells us that g-i = 2. The recip- 
rocal of 0.8 is 1 .25, so <23 = 1 and then, finally, 1/0.25 = 
4, so = 4 and the continued fraction stops. 

The mathematician John Wallis, who worked in the 
seventeenth century, seems to have been the first to 
give a systematic account of continued fractions and 
to recognize that continued-fraction expansions exist 
for all numbers (not only rational numbers), provided 
that we allow the continued fraction to have infinitely 
many levels. If we start with any positive number, we 
can build its continued fraction in the same way as 

for the ratio 2.35714 For example, if the number 

is tt = 3.14159265 . . . , we start by subtracting 3, then 
take the reciprocal of what is left: 1/0.14159... = 

7.06251 So for tt we get that the second quotient 

is 7. Continuing the process we build the continued 
fraction 



The numbers 3, 7, 15, and so on, that appear in the 
fraction are called the partial quotients of tt. 

The continued fraction for a real number can be used 
to approximate it by rational numbers. If we truncate 
the continued fraction after several steps, we are left 
with a finite continued fraction which is a rational num- 
ber: for example, by truncating the fraction (7), one 
level down we get the familiar approximation tt « 
3 + 1/7 = 22/7; at the second level we get the approx- 
imation 3 + 1/(7 + 1/15) = 333/106. The truncations 
at different levels thus generate a sequence of rational 
approxima tions: the sequence for tt begins 
3, 22/7, 333/106, 355/113 
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Whatever positive number x we start with, the 
sequence of continued-fraction approximations will 
approach x as we move further down the fraction. 
Indeed, the formal interpretation of the equation (7) is 
precisely that the successive truncations of the fraction 
approach tt. 

Naturally, in order to get better approximations to 
a number x we need to take more “complicated” 
fractions — fractions with larger numerator and denom- 
inator. The continued-fraction approximations to x are 
best approximations to x in the following sense: if p/q 
is one of these fractions, then it is impossible to find 
any fraction r/s that is doser than p/q to x and that 
has denominator 5 smaller than q. 

Moreover, if p/q is one of the approximations Corn- 
ing from the continued fraction for x, then the error 
x - p/q cannot be too large relative to the size of the 
denominator q ; specifically, it is always true that 



This error estimate shows just how special the contin- 
ued-fraction approximations are: if you pick a denom- 
inator q without thinking, and then select the numera- 
tor p that makes p/q closest to x, the only thing you 
can guarantee is that x lies between ( p - 1/2 ) /q and 
(p + 1/2 )/q. So the error could be as large as l/(2q), 
which is much bigger than \ / (q 2 ) if q is a large integer. 

Sometimes a continued-fraction approximation to x 
can have even smaller error than is guaranteed by (8). 
For example, the approximation tt » 355/113 that we 
get by truncating (7) at the third level is exceptionally 
accurate, the reason being that the next partial quo- 
tient, 292, is rather large. So we are not changing the 
fraction much by ignoring the tail 1/(292 + 1/(1 + '■.)). 
In this sense, the most difficult number to approximate 
by fractions is the one with the smallest possible par- 
tial quotients, i.e., the one with all its partial quotients 
equal to 1. This number, 

1 + 1 , 1 1 ■ (9) 

can be easily calculated because the sequence of par- 
tial quotients is periodic: it repeats itself. If we call the 
number </>, then ø — 1 is 1/(1 + 1/(1 + The recip- 
rocal of this number is exactly the continued fraction 
(9) for 4>. Hence 



which in turn implies that cf> 2 - <p = 1. The roots of 
this quadratic equation are (1 + VS)/2 = 1.618 . . . and 
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(1 • v'5)/2 = -0.618 Since the number we are try- 

ing to find is positive, it is the first of these roots: the 
so-called golden ratio. 

It is quite easy to show that, just as (9) represents 
the positive solution of the equation x 2 - x - 1 = 0, 
any other periodic continued fraction represents a root 
of a quadratic equation. This faet seems to have been 
understood already in the sixteenth century. It is quite 
a lot trickier to prove the converse: that the contin- 
ued fraction of any quadratic surd is periodic. This was 
established by lagrange [VI.22] during the eighteenth 
century and is closely related to the existence of units 
in quadratic number helds (see algebraic numbers 
HV.3I). 

3 Continued Fractions for Functions 

Several of the most important functions in mathemat- 
ics are most easily described using inhnite sums. For 
example, the exponential function [III.25] has the 
inhnite series 



There are also a number of functions that have sim- 
ple continued-fraction expansions: continued fractions 
involving a variable like x. These are probably the most 
important continued fractions historically. 

For example, the function x — tanx has the contin- 
ued fraction 



valid for any value of x other than the odd multi- 
ples of tt/2, where the tangent function has a vertical 
asymptote. 

Whereas the inhnite series of a function can be trun- 
cated to provide polynomial approximations to the 
function, truncation of the continued fraction provides 
approximations by rational functions : functions that 
are ratios of polynomials. For instance, if we truncate 
the fraction for the tangent after one level, then we get 
the approximation 

3x 

anx ™ i - X2/3- " 3 - x 2 ■ 

This continued fraction, and the rapidity with which its 
truncations approach tanx, played the central role in 
the proof that tt is irrational: that tt is not the ratio of 
two whole numbers. The proof was found by Johann 
Lambert in the 1760s. He used the continued fraction 
to show that if x is a rational number (other than 0), 
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then tanx is not. But tamr/4 = 1 (which certainly is 
rational), so tt/ 4 cannot be. 


III.23 The Euler and Navier-Stokes 
Equations 

Charles Fefferman 


The Euler and Navier-Stokes equations describe the 
motion of an idealized fluid. They are important in sci- 
ence and engineering, yet they are very poorly under- 
stood. They present a major challenge to mathematics. 

To State the equations we work in Euclidean space 
R d , with d = 2 or 3. Suppose that, at position 
x = (xi,...,Xd) g R d and at time t g R, the 
fluid is moving with a velocity vector u(x,t) = 
(xti(x, t),...,Md(x, t)) G R d , and the pres sure in the 
fluid is p(x, t) g R. The Euler equation is 

{lt + ^ 1 u ^h {x ’ t)= ^ x ’ t) y=1 ’--’ d) 

(i) 

for all (x, t); and the Navier-Stokes equation is 



(2) 


for all (x, t) . Here, v > 0 is a coefficient of friction called 
the “viscosity” of the fluid. 

In this article we restrict our attention to incompress- 
ible fluids, which means that, in addition to requiring 
that they satisfy (1) or (2), we also demand that 

for all (x, t). The Euler and Navier-Stokes equations 
are nothing but Newton's law F = ma applied to an 
infinitesimal portion of the fluid. In faet, the vector 



is easily seen to be the acceleration experienced by a 
molecule of fluid that finds itself at position x at time t. 

The forces F leading to the Euler equation arise 
entirely from pressure gradients (e.g., if the pressure 
inereases with height, then there is a net force pushing 


the fluid down). The additional term 



in (2) arises from frictional forces. 

The Navier-Stokes equations agree very well with 
experiments on real fluids under many and varied 
circumstances. Since fluids are important, so are the 
Navier-Stokes equations. 

The Euler equation is simply the limiting case v = 0 
of Navier-Stokes. Elowever, as we shall see, solutions of 
the Euler equation behave very differently from solu- 
tions of the Navier-Stokes equation, even when v is 
small. 

We want to understand the solutions of the Euler 
equations (1) and (3), or the Navier-Stokes equations 
(2) and (3), together with an initial condition 

tt(x) = u°(x) for all x g M* 4 , (4) 

where u°(x) is a given initial velocity, i.e., a vector- 
valued funetion on R d . For consistency with (3), we 
assume that 

divn°(x) = 0 for all x g R d . 

Also, to avoid physically unreasonable conditions, such 
as infinite energy, we demand that w°(x), as well as 
u(x,t) for each fixed t, should tend to zero “fast 
enough” as |x| — oo. We will not specify here exaetly 
what is meant by “fast enough,” but we assume from 
now on that we are dealing only with such rapidly 
decreasing velocities. 

A physicist or engineer would want to know how 
to calculate efficiently and accurately the solution to 
the Navier-Stokes equations (2)-(4), and to understand 
how that solution behaves. A mathematician asks first 
whether a solution exists, and, if so, whether there 
is only one solution. Although the Euler equation is 
250 years old and the Navier-Stokes equation well over 
100 years old, there is no consensus among experts as 
to whether Navier-Stokes or Euler solutions exist for all 
time, or whether instead they “break down” at a finite 
time. Definitive answers supported by rigorous proofs 
seem a long way off. 

Let us State more precisely the problem of “break- 
down” for the Euler and Navier-Stokes equations. Equa- 
tions (l)-(3) refer to the first and second derivatives of 
u(x, t). It is natural to suppose that the initial velocity 
u°(x) in (4) has derivatives 
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of all orders, and that these derivatives tend to zero 
“fast enough” as |x| — oo. We then ask whether the 
Navier-Stokes equations (2)-(4), or the Euler equations 
(1), (3), and (4), have solutions u(x, t), p(x, t), defined 
for all x g R d and t > 0, such that the derivatives 



and 3“ t p(x, t) of all orders exist for all x e R d , t g 
[0, oo) (and tend to zero “fast enough” as \x\ — oo). A 
pair u and p with these properties is called a “smooth” 
solution for the Euler or Navier-Stokes equations. No 
one knows whether such solutions exist (in the three- 
dimensional case). It is known that, for some positive 
time T = T(u°) > 0 depending on the initial velocity 
m° in (4), there exist smooth solutions u(x, t ), p(x, t) 
to the Euler or Navier-Stokes equations, defined for x g 
R d and t g [0, T). 

In two space dimensions (one speaks of “2D Euler” 
or “2D Navier-Stokes”), we can take T = +oo; in other 
words, there is no “breakdown” for 2D Euler or 2D 
Navier-Stokes. In three space dimensions, no one can 
rule out the possibility that, for some finite T = T(u°) 
as above, there is an Euler or Navier-Stokes solution 
u(x, t), p(x, t), which is defined and smooth on 
Q = { (x, t) :x GR 3 , te [0, T)}, 
such that some derivative |3“ t u(x, t)| or |3“ t p(x, t)\ 
is unbounded on Q. This would imply that there is 
no smooth solution past time T. (We say that the 
3D Navier-Stokes or Euler solution “breaks down” at 
time T.) Perhaps this can actually happen for 3D Euler 
and/or Navier-Stokes. No one knows what to believe. 

Many computer simulations of the 3D Navier-Stokes 
and Euler equations have been carried out. Navier- 
Stokes simulations exhibit no evidence of breakdown, 
but this may mean only that initial velocities u° that 
lead to breakdown are exceedingly rare. Solutions of 
3D Euler behave very wildly, so that it is hard to 
decide whether a given numerical study indicates a 
breakdown. Indeed, it is notoriously hard to perform a 
reliable numerical simulation of the 3D Euler equations. 

It is useful to study how a Navier-Stokes or Euler 
solution behaves if one assumes that there is a break- 
down. For instance, if there is a breakdown at time 
T < oo for the 3D Euler equation, then a theorem of 
Beale, Kato, and Majda asserts that the “vorticity” 
co(x, t) = curl(u(x, t)) 

_ (dii 2 du 3 du 3 dui dui du 2 \ 

V 3x3 3x2 ’ 3xi 3x3 ’ 9x2 3xi ) ' 
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grows so large as t — T that the integral 
| (max |co(x, t) |) df 

diverges. This has been used to invalidate some plau- 
sible computer simulations that allegedly indicated a 
breakdown for 3D Euler. It is also known that the direc- 
tion of the vorticity vector c o(x,t) must vary wildly 
with x, as t approaches a finite breakdown time T. 

The vector co in (5) has a natural physical meaning: it 
indicates how the fluid is rotating about the point x at 
time t. A small pinwheel placed in the fluid in position 
x at time t with its axis of rotation oriented parallel 
to co(x, t) would be turned by the fluid at an angular 
velocity |co(x, t)\. 

For the 3D Navier-Stokes equation, a recent result of 
V. Sverak shows that if there is a breakdown, then the 
pressure p(x, t) is unbounded, both above and below. 

A promising idea, pioneered by J. Leray in the 1930s, 
is to study “weak solutions” of the Navier-Stokes equa- 
tions. The idea is as follows. At first glance, the Navier- 
Stokes equations (2) and (3) make sense only when 
u(x,t), p(x,t) are sufficiently smooth: for example, 
one would like the second derivatives of u with respect 
to the Xj to exist. However, a formal calculation shows 
that (2) and (3) are apparently equivalent to conditions 
that we shall call (2') and (3'), which make sense even 
when u(x,t) and p(x,t) are very rough. Let us first 
see how to derive (2') and (3'), and then we will discuss 
their use. 

The starting point is the observation that a function 
F on R n is equal to zero if and only if j R „FØ dx = 0 for 
every smooth function 0. Applying this remark to the 
3D Navier-Stokes equations (2) and (3) and performing 
a simple formal computation (an integration by parts), 
we find that (2) and (3) are equivalent to the following 
equations: 



More precisely, given any smooth functions u(x, t ) and 
p(x,t), equations (2) and (3) hold if and only if (2') 
and (3') are satisfied for arbitrary smooth functions 
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Øi (x, t), 02 Ø3 (x, t), and qp(x,t) that vanish 
outside a compact subset of R 3 x (0, oo). 

We call Øi, 02, 03, and <£ test functions, and we 
say that u and p form a weak solution of 3D Navier- 
Stokes. Since all the derivatives in (2') and (3') are 
applied to smooth test functions, equations (2') and (3') 
make sense even for very rough functions u and p. To 
summarize, we have the following conclusion. 

A smooth pair ( u,p ) solves 3D Navier-Stokes if and 
onlyifitis a weak solution. However, the idea ofa weak 
solution makes sense even for rough (u, p). 

We hope to use weak solutions, by carrying out the 
following plan. 

Step (i): prove that suitable weak solutions exist for 3D 
Navier-Stokes on all of K 3 x (0, oo). 

Step (ii): prove that any suitable weak solution of 3D 
Navier-Stokes must be smooth. 

Step (iii): conclude that the suitable weak solution con- 
structed in step (i) is in faet a smooth solution of the 
3D Navier-Stokes equations on all of R 3 x (0, oo). 

Here, “suitable” means “not too big”; we omit the pre- 
cise definition. 

Analogues of the above plan have succeeded for 
interesting partial differential equations. But for 3D 
Navier-Stokes, the plan has been only partly carried 
out. It has been known for a long time how to con- 
struct suitable weak solutions of 3D Navier-Stokes, but 
the uniqueness of these solutions has not been proved. 
Thanks to the work of Sheffer, of Lin, and of Caffarelli, 
Kohn, and Nirenberg, it is known that any suitable weak 
solution to 3D Navier-Stokes must be smooth (i.e., it 
must possess derivatives of all orders), outside a set 
E c R 3 x (0, oo) of small fractal dimension [III. 17]. 
In particular, E cannot contain a curve. To rule out a 
breakdown, one would have to show that E is the empty 

For the Euler equation, weak solutions again make 
sense, but examples due to Sheffer and Shnirelman 
show that they can behave very strangely. A two- 
dimensional fluid that is initially at rest and subject 
to no outside forces can suddenly start moving in a 
bounded region of space and then return to rest. Such 
behavior can occur for a weak solution of 2D Euler. 

The Navier-Stokes and Euler equations give rise to 
a number of fundamental problems in addition to the 
breakdown problem discussed above. We finish this 
article with one such problem. Suppose that we fix an 


initial velocity u°(x) for the 3D Navier-Stokes or Euler 
equation. The energy Eo at time t = 0 is given by 

To=^f |w(x,0)| 2 dx. 

2 Jr3 

For v Js 0, let tt (v) (x, t) = (ui v) , u ^ ) denote the 
Navier-Stokes solution with initial velocity u° and with 
viscosity v. (If v = 0, then m ( 0) is an Euler solution.) 
We assume that u (v) exists for all time, at least when 
v > 0. The energy for u (vl (x, t ) at time t ^ 0 is given 
by 

£ (v) U) = H |u (v) (x,t)l 2 dx. 

2 Jr3 

An elementary calculation based on (1)— (3) (we multiply 
(1) or (2) by Ui (x), sum over i, integrate over all x g R 3 , 
and integrate by parts) shows that 



In particular, for the Euler equation we have v = 0, and 
(6) shows that the energy is equal to Eo, independently 
of time, as long as the solution exists. 

Now suppose that v is small but nonzero. From (6) it 
is natural to guess that | (d/dt)T (v, (t) | is small when v 
is small, so that the energy remains almost constant for 
a long time. However, numerical and physical experi- 
ments suggest strongly that this is not the case. Instead, 
it seems that there exists To > 0, depending on u° but 
independent of v, such that the fluid loses at least half 
of its initial energy by time To, regardless of how small 
v is (provided that v > 0). 

It would be very important if one could prove (or dis- 
prove) this assertion. We need to understand why a tiny 
viscosity dissipates a lot of energy. 


III.24 Expanders 

Avi Wigderson 


1 The Basic Definition 

An expander is a special sort of graph [III. 34] that has 
remarkable properties and many applications. Roughly 
speaking, it is a graph that is very hard to disconnect 
because every set of vertices in the graph is joined by 
many edges to its complement. More precisely, we say 
that a graph with n vertices is a c -expander if for every 
m ^ \n and every set 5 of tn vertices there are at least 
cm edges between S and the complement of S. 

This definition is particularly interesting when G is 
sparse: in other words, when G has few edges. We shall 
concentrate on the important special case where G is 
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regular of degree d for some fixed constant d that is 
independent of the number n of vertices: this means 
that every vertex is joined to exactly d others. When 
G is regular of degree d, the number of edges from S 
to its complement is obviously at most dm, so if c is 
some fixed constant (that is, not tending to zero with 
n), then the number of edges between any set of ver- 
tices and its complement is within a constant of the 
largest number possible. As this comment suggests, we 
are usually interested not in single graphs but in inh- 
nite families of graphs: we say that an infinite family of 
cf-regular graphs is a family of expanders if there is a 
constant c > 0 such that each graph in the family is a 
c-exp ander. 

2 The Existence of Expanders 

The first person to prove that expanders exist was 
Pinkser, who proved that if n is large and d ^ 3, 
then almost every d-regular graph with n vertices is 
an expander. That is, he proved that there is a constant 
c > 0 such that for every fixed d ^ 3, the proportion of 
d-regular graphs with n vertices that are not expanders 
tends to zero as n tends to in fin i ty. This proof was an 
early example of the probabilistic method [IV.23 §3] 
in combinatorics. It is not hard to see that if a d-regular 
graph is chosen uniformly at random, then the expected 
number of edges leaving a set S is d|5|(n - |S[)/n, 
which is at least (jd)\S\. Standard “tail estimates” are 
then used to prove that, for any fixed S, the probabil- 
ity that the number of edges leaving S is significantly 
different from its expected value is extremely small: so 
small that if we add up the probabilities for all sets, 
then even the sum is small. So with high probability 
all sets S have at least c|S| edges to their comple- 
ment. (In one respect this description is misleading: it 
is not a straightforward matter to discuss probabilities 
of events concerning random d-regular graphs because 
the edges are not independently chosen. However, Bol- 
lobås has defined an equivalent model for random 
regular graphs that allows them to be håndled.) 

Note that this proof does not give us an explicit 
description of any expander: it merely proves that 
they exist in abundance. This is a drawback to the 
proof, because, as we shall see later, there are appli- 
cations for expanders that depend on some kind of 
explicit description, or at least on an efficient method 
of producing expanders. But what exactly is an “explicit 
description” or an “efficient method”? There are many 
possible answers to this question, of which we shall dis- 


cuss two. The first is to demand that there is an algo- 
rithm that can list, for any integer n, all the vertices and 
edges of a d-regular c-expander with around n vertices 
(we could be flexible about this and ask for the num- 
ber of vertices to be between n and n 2 , say) in a time 
that is polynomial in n. (See computational complex- 
ity [IV.21 §2] for a discussion of polynomial-time algo- 
rithms.) Descriptions of this kind are sometimes called 
“mildly explicit.” 

To get an idea of what is “mild” about this, consider 
the following graph. Its vertices are all 01 sequences 
of length k, and two such sequences are joined by an 
edge if they differ in exactly one place. This graph is 
sometimes called the discrete cube in k dimensions. It 
has 2 k vertices, so the time taken to list all the vertices 
and edges will be huge compared with k. However, for 
many purposes we do not actually need such a list: what 
matters is that there is a concise way of representing 
each vertex, and an efficient algorithm for listing the 
(representations of the) neighbors of any given vertex. 
Here the 01 sequence itself is a very concise represen- 
tation, and given such a sequence cr it is very easy to 
list, in a time that is polynomial in k rather than 2 k , the 
k sequences that can be obtained by altering cr in one 
place. Graphs that can be efficiently described in this 
way (so that listing the neighbors of a vertex takes a 
time that is polynomial in the logarithm of the number 
of vertices) are called strongly explicit. 

The quest for explicitly constructed expanders has 
been the source of some beautiful mathematics, which 
has often used ideas from helds such as number theory 
and algebra. The first explicit expander was discovered 
by Margulis. We give his construction and another one; 
we stress that although these constructions are very 
simple to describe, it is rather less easy to prove that 
they really are expanders. 

Margulis’s construction gives an 8-regular graph G m 
for every integer m. The vertex set is Z m x Z m , where 
z m is the set of all integers mod m. The neighbors of 
the vertex (x,y) are ( x + y,y ), ( x -y,y ), (x,y + x), 
(x,y -x), (x + y+l,y), (x-y + l,y), (x,y + x + 1), 
(x,y — x + 1) (all operations are mod m). Margulis’s 
proof that G m is an expander was based on represen- 
tation theory [IV.12] and did not provide any specihc 
bound on the expansion constant c. Gabber and Galil 
later derived such a bound using harmonic analy- 
sis [IV. 18]. Note that this family of graphs is strongly 
explicit. 

Another construction provides, for each prime p, a 
3-regular graph with p vertices. This time the vertex 
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set is z p , and a vertex x is connected to x + 1, x - 1, 
and x _1 (where this is the inverse of x mod p, and we 
define the inverse of 0 to be 0). The proof that these 
graphs are expanders depends on a deep result in num- 
ber theory, called the Selberg 3/16 theorem. This family 
is only mildly explicit, since we are at present unable to 
generate large primes deterministically. 

Until recently, the only known methods for explic- 
itly constructing expanders were algebraic. However, in 
2002 Reingold, Vadhan, and Wigderson introduced the 
so-called zigzag product of graphs, and used it to give 
a combinatorial, iterative construction of expanders. 

3 Expanders and Eigenvalues 

The condition that a graph should be a c-expander 
involves all subsets of the vertices. Since there are expo- 
nentially many subsets, it would seem on the face of 
it that checking whether a graph is a c-expander is 
an exponentially long task. And, indeed, this problem 
turns out to be co-np complete [IV.21 §§3,4]. How- 
ever, we shall now describe a closely related property 
that can be checked in polynomial time, and which is 
in some ways more natural. 

Given a graph G with n vertices, its adjacency matrix 
A is the n x n matrix where A uv is defined to be 1 if 
u is joined to v and 0 otherwise. This matrix is real 
and symmetric, and therefore has n real eigenvalues 

[1.3 §4.3] A,,A 2 A„, which we name in such a way 

that Ai ^ A 2 ^ A „. Moreover, eigenvectors 

[1.3 §4.3] with distinet eigenvalues are orthogonal. 

It turns out that these eigenvalues encode a great deal 
of useful information about G. But before we come to 
this, let us briefly consider how A acts as a linear map. 
If we are given a funetion /, defined on the vertices of 
G, then Af is the funetion whose value at u is the sum 
of f(v) over all neighbors v of u. From this we see 
immediately that if G is d-regular and / is the funetion 
that is 1 at every vertex, then Af is the funetion that is 
d at every vertex. In other words, a constant funetion 
is an eigenvector of A with eigenvalue d. It is also not 
hard to see that this is the largest possible eigenvalue 
Ai, and that if the graph is connected, then the second 
largest eigenvalue A 2 will be strietly less than d. 

In faet, the relationship between A 2 and connectiv- 
ity properties of the graph is considerably deeper than 
this: roughly speaking, the further away A 2 is from d, 
the bigger the expansion parameter c of the graph. 
More precisely, it can be shown that c lies between 
\ (d - A 2 ) and f2d(d - A 2 ). From this it follows that 


an infmite family of d-regular graphs is a family of 
expanders if and only if there is some constant a > 0 
such that the spectral gaps d - A 2 are at least a for 
every graph in the family. One of the many reasons 
these bounds on c are important is that although, as 
we have remarked, it is hard to test whether a graph 
is a c-expander, its second largest eigenvalue can be 
computed in polynomial time. So we can at least obtain 
estimates for how good the expansion properties of a 
graph are. 

Another important parameter of a d-regular graph 
G is the largest absolute value of any eigenvalue apart 
from Ai, which we denote by A(G). If A(G) is small, 
then G behaves in many respects like a random d- 
regular graph. For example, let A and B be two dis- 
joint sets of vertices. If G were random, a small calcu- 
lation shows that we would expect the number E(A,B) 
of edges from A to B to be about d\A\ |B| /n. It can be 
shown that, for any two disjoint sets in any d-regular 
graph G, E(A,B) will differ from this expected amount 
by at most A(G)VlA| |B|. Therefore, if A(G) is a small 
fraction of d, then between any two reasonably large 
sets A and B we get roughly the number of edges that 
we expect. This shows that graphs for which A(G) is 
small “behave like random graphs.” 

It is natural to ask how small A(G) can be in d- 
regular graphs. Alon and Boppana proved that it was 
always at least 2 fd - 1 - gin) for a certain funetion 
g that tends to zero as n inereases. Friedman proved 
that almost all d-regular graphs G with n vertices have 
A(G) < 2yd 1 + h(n), where h(ri) tends to zero, so 
a typical d-regular graph comes very close to match- 
ing the hest possible bound for A(G). The proof was 
a tour de force. Even more remarkably, it is possible 
to match the lower bound with explicit constructions: 
the famous Ramanujan graphs of Lubotzky, Philips, 
and Sarnak, and, independently, Margulis. They con- 
structed, for each d such that d - 1 is a prime power, a 
family of d-regular graphs G with A(G) = 2-Jd - 1 . 

4 Applications of Expanders 

Perhaps the most obvious use for expanders is in 
communication networks. The faet that expanders are 
highly connected means that such a network is highly 
“fault tolerant,” in the sense that one cannot cut off 
part of the network without destroying a large number 
of individual communication lines. Further desirable 
properties of such a network, such as a small diameter, 
follow from an analysis of random walks on expanders. 
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A random walk of length m on a d-regular graph G is 
a path vo,vi,...,v m , where each vi is a randomly cho- 
sen neighbor of Vi-\. Random walks on graphs can be 
used to model many phenomena, and one of the ques- 
tions one frequently asks about a random walk is how 
rapidly it “mixes.” That is, how large does m have to 
be before the probability that v m = v is approximately 
the same for all vertices vi 

If we let pk(v) be the probability that Vk = v, then 
it is not hard to show that Pk+i = d~ l Apk . In other 
words, the transition matrix T of the random walk, 
which tells you how the distribution after k + 1 steps 
depends on the distribution after k steps, is d -1 times 
the adjacency matrix A. Therefore, its largest eigen- 
value is 1, and if A (G) is small then all other eigenvalues 

Suppose that this is the case, and let p be any proba- 
bility distribution [III. 73] on the vertices of G. Then 
we can write p as a linear combination X; «i, where 
m, is an eigenvector of T with eigenvalue d -1 Af. If T 
is applied k times, then the new distribution will be 
Xifd^Aj^Wi. If A (G) is small, then (d _1 Ai) k tends 
rapidly to zero, except that it equals 1 when i = 1. 
In other words, after a short time, the “nonconstant 
part” of p goes to zero and we are left with the uniform 
distribution. 

Thus, random walks on expanders mix rapidly. This 
property is at the heart of some of the applications of 
expanders. For example, suppose that V is a large set, / 
is a function from V to the interval [0, 1], and we wish 
to estimate quickly and accurately the average of /. A 
natural idea is to choose a random sample v\,vz,...,v \ t 
of points in V and calculate the average k _1 Xi=i f(v0- 
If k is large and the Vi are chosen independently, then 
it is not too hard to prove that this sample average 
will almost certainly be close to the true average: the 
probability that they differ by more than e is at most 

e -e 2 k 

This idea is very simple, but actually implementing 
it requires a source of randomness. In theoretical com- 
puter science, randomness is regarded as a resource, 
and it is desirable to use less of it if one can. The 
above procedure needed about log ( I V | ) bits of ran- 
domness for each Vi, so k log ( [ V | ) bits in all. Can we 
do better? Ajtai, Komlos, and Szemerédi showed that 
the answer is yes: big time! What one does is asso- 
ciate V with the vertices of an explicit expander. Then, 
instead of choosing V] ,Vz, ■ ■ ■ ,Vk independently, one 
chooses them to be the vertices of a random walk in 
this expanding graph, starting at a random point Vi 
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of V. The randomness needed for this is far smaller: 
log(|V|) bits for V\ and log(d) bits for each further Vt, 
making log(|V|) + klog(d) bits in all. Since V is very 
large and d is a fixed constant, this is a big saving: we 
essentially pay only for the first sample point. 

But is this sample any good? Clearly there is a heavy 
dependence between tj||t?i . However, it can be shown 
that nothing is lost in accuracy: again, the probabil- 
ity that the estimate differs from the true mean by 
more than e is at most e -6 k . Thus, there are no costs 
attached to the big saving in randomness. 

This is just one of a huge number of applications of 
expanders, which include both practical applications 
and applications in pure mathematics. For instance, 
they were used by Gromov to give counterexamples to 
certain variants of the famous baum-connes conjec- 
ture [IV. 19 §4.4]. And certain bipartite graphs called 
“lossless expanders” have been used to produce linear 
codes with efficient decodings. (See reliable trans- 
mission of information [VII. 6] for a description of 
what this means.) 


III.25 The Exponential and Logarithmic 
Functions 


1 Exponentiation 

The following is a very well-known mathematical 
sequence: 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, .... 
Each term in this sequence is twice the term before, so, 
for instance, 128, the seventh term in the sequence, is 
equal to2x2x2x2x2x2x2. Since repeated multipli- 
cations of this kind occur throughout mathematics, it 
is useful to have a less cumbersome notation for them, 
so2x2x2x2x2x2x2is normally written as 2 7 , 
which we read as “2 to the power 7” or just “2 to the 7.” 
More generally, if a is any real number and m is any 
positive integer, then a m stands for a x a x ■ ■ ■ x a, 
where there are m as in the product. This product is 
called “a to the m,” and numbers of the form a m are 
called the powers of a. 

The process of raising a number to a power is known 
as exponentiation. (The number m is called the expo- 
nent.) A fundamental faet about exponentiation is the 
following identity: 

a m+n _ a m . a n 

This says that exponentiation “turns addition into mul- 
tiplication.” It is easy to see why this identity must be 
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true if one looks at a small example and temporarily 
reverts to the old, cumbersome notation. For instance, 

2 7 =2x2x2x2x2x2x2 
= (2x2x2) X(2X 2x2x2) 

= 2 3 X 2 4 . 

Suppose now that we are asked to evaluate 2 3/2 . At 
first sight, the question seems misconceived: an essen- 
tial part of the definition of 2 m that has just been given 
was that m was a positive integer. The idea of multi- 
plying one-and-a-half 2 s toge ther does not make sense. 
However, mathematicians like to generalize, and evenif 
we cannot immediately make sense of 2 m except when 
m is a positive integer, there is nothing to stop us 
inventing a meaning for it for a wider class of numbers. 

The more natural we make our generalization, the 
more interesting and useful it is likely to be. And the 
way we make it natural is to ensure that at all costs we 
keep the property of “turning addition into multiplica- 
tion.” This, it turns out, leaves us with only one sensi- 
ble choice for what 2 3/2 should be. If the fundamental 
property is to be preserved, then we must have 

2 3/2 . 2 3/2 = 2 3/2+3/2 = 2 3 = g 

Therefore, 2 3/2 has to be ±V8. It turns out to be con- 
venient to take 2 3/2 to be positive, so we define 2 3/2 to 
be a/8. 

A similar argument shows that 2° should be defined 
to be 1: if we wish to keep the fundamental property, 
then 

Dividing both sides by 2 gives the answer 2° = 1. 

What we are doing with these kinds of arguments is 
solving a functional equation, that is, an equation where 
the unknown is a function. So that we can see this more 
clearly, let us write f(t) for 2 f . The information we are 
given is the fundamental property f(t + u) = f{t)f(u ) 
together with one value, /( 1) = 2, to get us started. 
From this we wish to deduce as much as we can about /. 

It is a nice exercise to show that the two conditions 
we have placed on / determine the value of / at every 
rational number, at least if / is assumed to be positive. 
For instance, to show that /(O) should be 1, we note 
that /(0)/(l) = /( 1), and we have already shown that 
/( 3/2) must be V8. The rest of the proof is in a similar 
spirit to these arguments, and the conclusion is that 
fip/q ) must be the qth root of 2 p . More generally, the 
only sensible definition of a plq is the qth root of a p . 

We have now extracted everything we can from the 
functional equation, but we have made sense of a 1 only 


if t is a rational number. Can we give a sensible defini- 
tion when t is irrational? For example, what would be 
the most natural definition of 2 V ' 2 ? Since the functional 
equation alone does not determine what 2 ,/2 should 
be, the way to answer a question like this is to look 
for some natural additional property that / might have 
that would, together with the functional equation, spec- 
ify / uniquely. It turns out that there are two obvious 
choices, both of which work. The first is that / should 
be an increasing function: that is, if s is less than t, then 
f(s) is less than f(t). Alternatively, one can assume 
that / is continuous [1.3 §5.2], 

Let us see how the first property can in principle be 
used to work out 2' 22 . The idea is not to calculate it 
directly but to obtain better and better estimates. For 
instance, since 1.4 < a/ 2 < 1.5 the order property tells 
us that 2^ 2 should lie between 2 7/5 and 2 3/2 , and in gen- 
eral that if p lq < a/2 < r /s then 2 ' J2 should lie between 
2 plq and 2 r/s . It can be shown that if two rational num- 
bers p/q and r/5 are very close to each other, then 2 plq 
and 2 rls are also close. It follows that as we choose frac- 
tions p/q and r / s that are doser and doser together, so 
the resulting numbers 2 p/q and 2 r/s converge to some 
limit, and this limit we call 2 q2 . 

2 The Exponential Function 

One of the hallmarks of a truly important concept in 
mathematics is that it can be defined in many dif- 
ferent but equivalent ways. The exponential function 
exp(x) very definitely has this property. Perhaps the 
most basic way to think of it, though for most purposes 
not the best, is that exp(x) = e x , where e is a number 
whose decimal expansion begins 2.7182818. Why do 
we focus on this number? One property that singles it 
out is that if we differentiate the function exp(x) = e x , 
then we obtain e x again— and e is the only number for 
which that is true. Indeed, this leads to a second way of 
defimng the exponential function: it is the only solution 
of the differential equation f(x) = f(x) that satisfies 
the initial condition /(O) = 1. 

A third way to define exp(x), and one that is often 
chosen in textbooks, is as the limit of a power series: 

x 2 x 3 

exp(x) = 1 + x + ^r + ^r + -- -’ 
known as the Taylor series of exp(x). It is not immedi- 
ately obvious that the right-hand side of this definition 
gives us some number raised to the power x, which 
is why we are using the notation exp(x) rather than 
e x . However, with a bit of work one can verify that it 
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yields the basic properties exp(x+y ) = exp(x) exp(y), 
exp(O) = 1, and (d/dx)exp(x) = exp(x). 

There is yet another way to define the exponential 
function, and this one comes much closer to telling us 
what it really means. Suppose you wish to invest some 
money for ten years and are given the following choice: 
either you can add 100% to your investment (that is, 
double it) at the end of the ten years, or each year you 
can take whatever you have and increase it by 10%. 
Which would you prefer? 

The second is the better investment because in the 
second case the interest is compounded : for instance, if 
you start with $ 100, then after a year you will have $ 1 10 
and after two years you will have $121. The increase of 
$ 1 1 in the second year breaks down as 10% interest on 
the original $100 plus a further dollar, which is 10% 
interest on the interest earned in the first year. Under 
the second scheme, the amount of money you end up 
with is $100 times (l.l) 10 , since each year it multiplies 
by 1.1. The approximate value of (l.l) 10 is 2.5937, so 
you will get almost $260 instead of $200. 

What if you compounded your interest monthly? 
Instead of multiplying your investment by 1 ten 
times, you would multiply it by 1 120 times. By the 

end of ten years your $ 100 would have been multiplied 
by (1 + j2 0 ) 120 , which is approximately 2.707. If you 
compounded it daily, you could increase this to approx- 
imately 2.718, which is suspiciously close to e. In faet, 
e can be defined as the limit, as n tends to infinity, of 
the number (1 + å) n - 

It is not instantly obvious that this expression really 
does tend to a limit. For any fixed power m, the limit 
of (1 + l ) m as n tends to infinity is 1, while for any 
fixed n, the limit as m tends to infinity is oo. When it 
comes to (1 + ^ )”, the increase in the power just com- 
pensates for the decrease in the number 1 + ^ and we 
get a limit between 2 and 3. If x is any real number, then 
(1 + % )” also converges to a limit, and this we define 
to be exp(x). 

Here is a sketch of an argument that shows that if we 
define exp(x) this way, then exp(x) exp(;y) = exp(x + 
y ), the main property we need if our definition is to be 
a good one. Let us take a very large n and look at the 
number 


.zv 


Now the ratio of 1 + x/n + y/n + xy/n 2 to 1 + x/n + 
y/n is smaller than 1 + xy/n 2 , and (1 + xy/n 2 ) n can 
be shown to converge to 1 (as here the increase in n 
is not enough to compensate for the rapid decrease in 
xy/n 2 ). Therefore, for large n the number we have is 
very close to 

(fH 

Letting n tend to infinity, we deduce the result. 

3 Extending the Definition to 
Complex Numbers 

If we think of exp(x) as e x , then the idea of generalizing 
the definition to complex numbers seems hopeless: our 
intuition tells us nothing, the funetional equation does 
not help, and we cannot use continuity or order rela- 
tions to determine it for us. However, both the power 
series and the compound-interest definitions can be 
generalized easily. If z is a complex number, then the 
most usual definition of exp(z) is 
z 2 z 3 

1 + Z+ ~ 2 [ + + " ' ' 

Setting z = id, for a real number 0, and splitting the 
resulting expression into its real and imaginary parts, 
we obtain 

Ø 4 ,( n Ø 3 Ø 5 \ 


1 - 


4! 


0- 


3! 


which, using the power-series expansions for cos(d) 
and sin(Ø), tells us that exp(id) = cos(d) + isin(d), 
the formula for the point with argument 0 on the unit 
circle in the complex plane. In particular, if we take 

0 = tt, we obtain the famous formula e 1 ” = -1 (since 
cos(tt) = -1 and sin(n) = 0). 

This formula is so striking that one feels that it ought 
to hold for a good reason, rather than being a mere 
faet that one notices after carrying out some formal 
algebraic manipulations. And indeed there is a good 
reason. To see it, let us return to the compound-interest 
idea and define exp(z) to be the limit of (1 + z/n)” as 
n tends to infinity. Let us concentrate just on the case 
where z = in: why should (1 + in/n)™ be close to -1 
when n is very large? 

To answer this, let us think geometrically. What is 
the effeet on a complex number of multiplying it by 

1 + in/n? On the Argand diagram this number is very 
close to 1 and vertically above it. Because the vertical 
line through 1 is tangent to the circle, this means that 
the number is very close indeed to a number that lies on 
the circle and has argument n/n (since the argument 
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of a number on the circle is the length of the circular 
arc from 1 to that number, and in this case the circular 
arc is almost straight). Therefore, multiplication by 1 + 
iTr/n is very well approximated by rotation through an 
angle of n/n. Doing this n times results in a rotation by 
tt, whichis the same as multiplication by -1. The same 
argument can be used to justify the formula exp(id) = 
cos(d) + isin(d). 

Continuing in this vein, let us see why the derivative 
of the exponential function is the exponential function. 
We know already that exp(z + w) = exp(z) exp (te), so 
the derivative of exp at z is the limit as w tends to 
zero of exp(z) (exp (te ) - \ )/u>. It is therefore enough 
to show that exp (te) - 1 is very close to te when te 
is small. To get a good idea of exp (te) we should take 
a large n and consider (1 + w/n) n . It is not hard to 
prove that this is indeed close to 1 + te, but here is an 
informal argument instead. Suppose that you have a 
bank account that offers a tiny rate of interest over a 
year, say 0.5%. How mueh better would you do if you 
could compound this interest monthly? The answer is 
not very mueh: if the total amount of interest is very 
small, then the interest on the interest is negligible. 
This, in essence, is why (1 + te/n) n is approximately 
1 + te when te is small. 

One can extend the definition of the exponential 
function yet further. The main ingredients one needs 
are addition, multiplication, and the possibility of lim- 
iting arguments. So, for example, if x is an element of a 
banach algebra [III.12] A, then exp(x) makes sense. 
(Here, the power series definition is the easiest, though 
not necessarily the most enlightening.) 

4 The Logarithm Function 

Natural logarithms, like exponentials, canbe defined in 
many ways. Here are three. 

(i) The function log is the inverse of the function 
exp. That is, if t is a positive real number, then 
the statement u = log(t) is equivalent to the 
statement t = exp(tt). 

(ii) Let t be a positive real number. Then 



(iii) If \x\ < 1 then log(l + x) = x - \x 2 + |x 3 - ■ ■ ■ . 
This defines log(t) for 0 < t < 2. If t > 2 then 
log(t) canbe defined as -log(l/t). 

The most important feature of the logarithmic func- 
tion is a funetional equation that is the reverse of the 


funetional equation for exp, namely log(5t) = log(5) + 
log(t). That is, whereas exp turns addition into multi- 
plication, log turns multiplication into addition. A more 
formal way of putting this is that R forms a group 
under addition, and R+, the set of positive real num- 
bers, forms a group under multiplication. The func- 
tion exp is an isomorphism from R to R+, and log, 
its inverse, is an isomorphism from R+ to R. Thus, in 
a sense the two groups have the same structure, and 
the exponential and logarithmic funetions demonstrate 
this. 

Let us use the first definition of log to see why log (v t ) 
must equal log(5) + log(t). Write s = exp(a) and t = 
exp (b). Note that a = log(5) and b = log(t). Then 
log(5) = a, log(t) = b, and 

log(st) = log(exp(a) exp(fi)) 

= log ( exp (a + b)) 

= a + b. 

The result follows. 

In general, the properties of log closely follow those 
of exp. However, there is one very important differ- 
ence, which is a complication that arises when one 
tries to extend log to the complex numbers. At first 
it seems quite easy: every complex number z can be 
written as ré° for some nonnegative real number r 
and some d (the modulus and argument of z, respec- 
tively). If z = re' e then log(z), one might think, should 
be log(r) + id (using the funetional equation for log 
and the faet that log inverts exp). The problem with 
this is that 0 is not uniquely determined. For instance, 
what is log ( 1 ) ? Normally we would like to say 0, but 
we could, perversely, say that 1 = e 2ni and claim that 
log ( 1 ) = 2ttl 

Because of this difficulty, there is no single best way 
to define the logarithmic function on the entire com- 
plex plane, even if 0, a number that does not have a 
logarithm however you look at it, is removed. One con- 
vention is to write z = re w with r > 0 and 0 ^ 0 < 2tt, 
which can be done in exaetly one way, and then define 
log(z) to be log(r) + id. However, this function is not 
continuous: as you cross the positive real axis, the 
argument jumps by 2tt and the logarithm jtunps by 
2 Tri. 

Remarkably, this difficulty, far from being a blow 
to mathematics, is an entirely positive phenomenon 
that lies behind several remarkable theorems in com- 
plex analysis, such as Cauchy’s residue theorem, which 
allows one to evaluate very general path integrals. 
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If f is a periodic function with period 1, then one can 
obtain a great deal of useful information about / by 
calculating its Fourier coefflcients (see the fourier 
transform [III.27] for a discussion of why). This is true 
for both theoreticai and practical reasons, and because 
of the latter it is highly desirable to have a good way of 
computing Fourier coefflcients quickly. 

The rth Fourier coefficient of / is given by the 
formula 

fix) = f f(x)e~ 2nlrx dx. 

Jo 

If we do not have an explicit formula for the integral 
(as would be the case, for instance, if / were derived 
from some physical signal rather than a mathemati- 
cal formula), then we will want to approximate this 
integral numerically, and a natural way to do that is 
to discretize it: that is, turn it into a sum of the form 
7V~'Xn=o f(n/N)e~ 2nirn/N . If / is not too wildly oscil- 
lating and r is not too big, then this should be a good 
approximation. 

The sum above will be unchanged if we add a mul- 
tiple of N to r, so we now care only about the values 
of / at points of the form n/N. Moreover, the period- 
icity of / tells us that adding a multiple of N to n also 
makes no difference. So we can regard both n and r as 
belonging to the group Zjv of integers mod N (see mod- 
ular arithmetic [IH.60]). Let us change our notation 
to one that reflects this. Given a function g defined on 
Z,v we define the discrete Fourier transform of g to be 
the function g, also defined on Zjv, which is given by 
the formula 

g(r) = N~ 1 X g(n)w~ rn , (1) 

nez N 

where we are writing co for e 2nllN , so that c v~ rn = 
e -2mmiN ]\j ote t jj at the sum over n could be regarded 
as a sum from 0 to JV - 1 just as above; the other nota- 
tional change is that we have written g(n) instead of 
/(n/N). 

The discrete Fourier transform can be thought of 
as multiplying a column vector (corresponding to the 
function g) by an N x N matrix (with entries N -1 w~ rn 
for each r and n). Therefore it can be calculated using 
about N 2 arithmetical operations. The fast Fourier 
transform arises from the observation that the sum in 
(1) has symmetry properties that allow it to be calcu- 
lated much more efftciently. This is most easily seen 
when N is a power of 2, and to make it even easier we 


shall look at the case N = 8. The sums to be evaluated 
are then 

gi 0) + co r g(l) + w 2r g(2) + ■ ■ ■ + c o lr g(7) 
for each r between 0 and 7. Now a sum like this can be 
rewritten as 

0(0) + w 2r g{2) + w 4r g{4) + m 6r g(6) 

+ tv r {g ( 1) + (x> 2r gfi) + w 4r g( 5) + m 6r g(7)), 
which is interesting because 


0(0) + CC 

< 2r g( 2) + et 

)4r g(4) + et 

’ 6r ø( 6) 

0(D + et 

< 2r g( 3) + a 

) 4r g (S) + et 

> 6r g(7) 


are themselves values of discrete Fourier transforms. 
For instance, if we set h(n) = g(2n) for 0 < n < 3, 
and write <p for c o 2 = e 27Ti/4 , then the first expression 
equals h( 0) + <p r h( 1) + ip 2r h{2) + y 3r h( 3). If we think 
of h as being defined on Z4, then this is precisely the 
formula for h(r). 

A similar remark applies to the second expression, 
so if we can calculate the discrete Fourier transforms 
of the “even part” of g and the “odd part” of g, then it 
will be very straightforward to obtain each value of the 
Fourier transform of g itself: it will be a linear combi- 
nation of values of the transforms of the two parts of 
g. Thus, if N is even and we write F(N) for the number 
of operations needed to calculate the discrete Fourier 
transform of a function defined on Zjy, we obtain a 
recurrence of the form 

F(N) = 2F(N/2) + CN. 

The interpretation of this is that in order to work out 
the N values of the transform of a function on Zjv, it is 
enough to work out two such transforms for functions 
on Zjvj /2 and work out N linear combinations. 

If N is a power of 2, then we can iterate this: F(N/2) 
will be at most 2F(N/4) + CN 12, and so on. It is not 
hard to show as a result that F(N) is at most CN log N 
for some constant C, a considerable improvement on 
CN 2 . If N is not a power of 2, then the above argu- 
ment does not work, but there are modifications of 
the method that do, and that lead to similar efficiency 
garns. (Indeed, this is true for the Fourier transform on 
an arbitrary finite Abelian group.) 

Once we can calculate Fourier transforms efftciently, 
there are other calculations that immediately become 
easy as well. A simple example is the inverse Fourier 
transform, which has a formula very similar to that 
of the Fourier transform and can therefore be calcu- 
lated in a similar way. Another calculation that becomes 
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easy is the convolution of two sequences, which is 
defined as follows. If a = (ao, a\, a2, . . . , a m ) and b = 
(bo, bi, i>2, ■ ■ ■ , b n ) are two sequences, then their convo- 
lution is the sequence c = (co, C\, C2, . . . , c m+n ), where 
each c r is defined to be aob r + a\b r ~i + ■ ■ ■ + a r bo- 
This sequence is denoted by a * b. One of the most 
important properties of Fourier transforms is that they 
“convert convolutions into multiplication.” That is, if 
we find a suitable way of regarding a and b as func- 
tions on Zjv, then the Fourier transform of a * b is the 
functionr — å(r)b(r). Therefore, to work out a * b we 
can work out å and b, multiply them together for each 
r , and take the inverse Fourier transform of the result. 
All stages of this calculation are quick, so calculating 
convolutions is quick. 

This immediately leads to a quick way of multiply- 
ing the two polynomials a o + a±x + ■ ■ ■ + a m x m and 
bo + bix + ■ ■ ■ + b n x n together, since the coefficients of 
the product are given by the sequence c = a * b. If all 
the at are between 0 and 9, it is a quick process to evalu- 
ate the product polynomial at x = 10 (since none of the 
coefficients c r will have many digits), so we also have 
a method of multiplying two n-digit integers together 
that is far faster than long multiplication. These are 
two of the huge number of applications of the fast 
Fourier transform. A more direct source of applications 
occurs in engineering, where one frequently wishes to 
analyze a signal by looking at its Fourier transform. A 
very surprising application is to Quantum computa- 
tion [III.76]: a famous result of Peter Shor is that one 
can use a quantum computer to factorize large integers 
very quickly; this algorithm depends in an essential way 
on the fast Fourier transform, but uses the power of 
quantum computing in an almost miraculous way to 
divide the N log N steps into N lots of log N steps that 
can be carried out “in parallel.” 


III.27 The Fourier Transform 

Terence Tao 


Let / be a function from R to R. Typically, there is 
not much that one can say about /, but certain func- 
tions have useful symmetry properties. For instance, 
f is called even if f(—x) = f(x) for every x, and it 
is called odd if f(-x) = -f(x) for every x. Further- 
more, every function / can be written as a superposition 
of an even part, / e , and an odd part, f 0 . For instance, 
the function f(x) = x 3 + 3x 2 + 3x + 1 is neither even 
nor odd, but it can be written as / e (x) + f 0 (x), where 


/ e (x) = 3x 2 + 1 and f 0 (x) = x 3 + 3x. For a gen- 
eral function /, the decomposition is unique and is 
given by the formulas / e (x) = | (f tx) + f(-x)) and 
fo(x) = § |/(x) /( x)). 

What are the symmetry properties enjoyed by even 
and odd functions? A useful way to regard them is as 
follows. We have a group of two transformations of the 
real line: one is the identity map i : x ■— x and the 
other is the reflection p : x — -x. Now any transfor- 
mation <£ of the real line gives rise to a transforma- 
tion of the functions defined on the real line: given a 
function /, the transformed function is the function 
g(x) = f(4>(x)). hi the case at hånd, if <fr = i then the 
transformed function is just f(x), while if <p = p then 
it is f(—x). If / is either even or odd, then both the 
transformed functions are scalar multiples of the orig- 
inal function /. In particular, when <p = p, the trans- 
formed function is f(x) when / is even (so the scalar 
multiple is 1) and -f(x) when / is odd (so the scalar 
multiple is -1). 

The procedure just described can be thought of as 
a very simple prototype of the general notion of a 
Fourier transform. Very broadly speaking, a Fourier 
transform is a systematic way to decompose “generic” 
functions into a superposition of “symmetric” func- 
tions. These symmetric functions are usually quite 
explicitly defined: for instance, one of the most impor- 
tant examples is a decomposition into the trigono- 
metric functions [III.94] sin(nx) and cos(nx). They 
are also often related to physical concepts such as fre- 
quency or energy. The symmetry will usually be asso- 
ciated with a group [1.3 §2.1] G, which is usually Abe- 
lian. (In the case considered above, it is the two-element 
group.) Indeed, the Fourier transform is a fundamental 
tool in the study of groups, and more precisely in the 
REPRESENTATION THEORY [IV. 12] of groups, which COn- 
cerns different ways in which a group can be regarded 
as a group of symmetries. It is also related to topics in 
linear algebra, such as the representation of a vector as 
linear combinations of an orthonormal basis [III. 3 7], 
or as linear combinations of eigenvectors [1.3 §4.3] of 
a matrix or linear operator [III. 5 2]. 

For a more complicated example, let us fix a positive 
integer n and let us define a systematic way of decom- 
posing functions from C to C, that is, complex-valued 
functions defined on the complex plane. If / is such a 
function and j is an integer between 0 and n-1, then 
we say that / is a harmonic of order j if it has the fol- 
lowing property. Let co = e 2TT1/n , so that co is a primi- 
tive nth root of 1 (meaning that co n = 1 but no smaller 
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positive power of w gives 1). Then f(wz) = o>'/(z) 
for every zeC. Notice that if n = 2, then co = -1, so 
when j = 0 we recover the definition of an even func- 
tion and when j = lwe recover the definition of an odd 
function. In faet, inspired by this, we can give a gen- 
eral formula for a decomposition of f into harmonics, 
which again turns out to be unique. If we define 

n k = 0 

then it is a simple exercise to prove that 

m = i fj(z) 

j = o 

for every z (use the faet that X j w ~ jk = n if fc = 0 
and 0 otherwise), and that fj(wz) = c o-ifj(z) for every 
z. Thus, / can be decomposed as a sum of harmon- 
ics. The group associated with this Fourier transform 
is the multiplicative group of the nth roots of unity 
1, co, ... , co™ -1 , or the cyclic group of order n. The root 
uV is associated with the rotation of the complex plane 
through an angle of 2nj/n. 

Now let us consider infmite groups. Let / be a 
complex-valued function defined on the unit circle T = 
{z g C : | z | = 1}. To avoid technical issues we shall 
assume that / is smooth — that is, it is infinitely dif- 
ferentiable. Now if / is a function of the simple form 
/(z) = cz n for some integer n and some constant 
c, then / will have rotational symmetry of order n. 
That is, if co = e 27ri/n again, then /(coz) = f(z) for 
all complex numbers z. After our earlier examples, it 
should come as no surprise that an arbitrary smooth 
function / can be expressed as a superposition of such 
rotationally symmetric funetions. Indeed, one can write 

/(z)= X /(n)z n , 

where the numbers /(n), called the Fourier coefficients 
of / at the frequencies n, are given by the formula 
i r 2n 

/(n) = — f(e ie )e~ inø dd. 

2tt Jo 

This formula can be thought of as the limiting case n — ■ 
oo of the previous decomposition, restricted to the unit 
circle. It can also be regarded as a generalization of the 
Taylor series expansion of a holomorphic function 
[1.3 §5.6]. If f is holomorphic on the closed unit disk 
{z g C : |z| < 1}, then one can write 

/(z) = X a nZ n - 


where the Taylor coefficient a n is given by the formula 


«n = *U 

2m J|z|=i z n+1 


In general, there are very strong links between Fourier 
analysis and complex analysis. 

When / is smooth, then its Fourier coefficients decay 
to zero very quickly and it is easy to show that the 
Fourier series Zn=-°° /(tt)2 n converges. The issue 
becomes more subtle if / is not smooth (for instance, 
if it is merely continuous). Then one must be careful to 
specify the precise sense in which the series converges. 
In faet, a significant portion of harmonic analysis 
[IV. 18] is devoted to questions of this kind, and to 
developing tools for answering them. 

The group of symmetries associated with this ver- 
sion of Fourier analysis is the circle group T. (Notice 
that we can think of the number e'° both as a point in 
the circle and as a rotation through an angle of 9. Thus, 
the circle can be identified with its own group of rota- 
tional symmetries.) But there is a second group that is 
important here as well, namely the additive group Z of 
all integers. If we take two of our basic symmetric fune- 
tions, z m and z n , and multiply them together, then we 
obtain the function z m+n , so the map n ~ z n is an iso- 
morphism from Z to the set of all these funetions under 
multiplication. The group Z is known as the Pontryagin 
dual to T. 

In the theory of partial differential equations and in 
related areas of harmonic analysis, the most important 
Fourier transformis defined on the Euclidean space R d . 
Among all funetions / : 7 d — C, the ones considered to 
be “basic” are the plane waves f(x) = c^e 2TTix '^, where 
§ e is a vector (known as the frequency of the plane 
wave), x ■ § is the dot product between the position 
x and the frequency §, and cg is a complex number 
(whose magnitude is the amplitude of the plane wave). 
Notice that sets of the form Fl\ = {x : x ■ § = A} are 
(hyper)planes orthogonal to g, and on each such set the 
value of f(x) is constant. Moreover, the value takenby 
/ on H\ is always equal to the value taken on H\ + 2n- 
This explains the name “plane waves.” It turns out that 
if a function / is sufficiently “nice” (e.g., smooth and 
rapidly decreasing as x gets large), then it can be rep- 
resented uniquely as the superposition of plane waves, 
where a “superposition” is now interpreted as an inte- 
gral rather than a summation. More precisely, we have 
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the formulas 1 

/(x)=f /(§)e 2 ™'5 d§, 

where 

/(5)=f /(x)e _27Tix? dx. 

J®* 

The function /(§) is known as the Fourier transform 
of /, and the second formula is known as the Fourier 
inversion formula. These two formulas show how to 
determine the Fourier-transformed function from the 
original function and vice versa. One can view the quan- 
tity / (§) as the extent to which the function / contains 
a component that oscillates at frequency g. As it turns 
out, there is no difficulty in justifying the convergence 
of these integrals when / is sufficiently nice, though 
the issue again becomes more subtle for functions that 
are somewhat rough or slowly decaying. In this case, 
the underlying group is the Euclidean group R d (which 
can also be thought of as the group of d-dimensional 
translations); note that both the position variable x and 
the frequency variable g are contained in R d , so R d is 
also the Pontryagin dual group in this setting. 2 

One major application of the Fourier transform Ues in 
understanding various linear operations on functions, 
such as, for instance, the Laplacian on R: d . Given a func- 
tion / : R d -> C, its Laplacian Af is defined by the 
formula 

where we think of the vector x in coordinate form, x = 
(xi, . . . ,Xd), and of / as a function /(x i, . . . ,x<j) of d 
real variables. To avoid technicalities let us consider 
only those functions that are smooth enough for the 
above formula to make sense without any difficulty. 

In general, there is no obvious relationship between a 
function / and its Laplacian Af. But when / is a plane 
wave such as /(x) = e 2mx '^, then there is avery simple 
relationship: 

Ae 2mx '5 = -4Tr 2 l5l 2 e 2m *'5, 

That is, the effect of the Laplacian on the plane wave 
e 2mx-5 j s t0 mu itipiy it by the scalar -4»T 2 |g| 2 . In 


1. In some texts, the Fourier transform is defined slightly dif- 
ferently, with factors such as 2n and -1 being moved to other 
places. These notational differences have some minor benefits and 
drawbacks, but they are all equivalent to each other. 

2. This is because of our reliance on the dot product; if one did 

not want to use this dot product, the Pontryagin dual would instead 

be (R d )*, the dual vector space to R d . But this subtlety is not too 


other words, the plane wave is an eigenfunction 3 for 
the Laplacian A, with eigenvalue -4tt 2 g| 2 . (More gen- 
erally, plane waves wUl be eigenfunctions for any lin- 
ear operation that commutes with translations.) There- 
fore, the Laplacian, when viewed through the lens of the 
Fourier transform, is very simple: the Fourier transform 
lets one write an arbitrary function as a superposition 
of plane waves, and the Laplacian has a very simple 
effect on each plane wave. To be explicit about it, 
A/(x) = a[ /(g)e 2 ™-5d§ 

= f /(g) Ae 2 ™'5dg 
= [ (-4Tr 2 |g| 2 )/(g)e 27rix ' s dg, 

jR d 

which gives us a formula for the Laplacian of a gen- 
eral function. Here we have interchanged the Laplacian 
A with an integral; this can be rigorously justified for 
suitably nice /, but we omit the details. 

This formula represents Af as a superposition of 
plane waves. But any such representation is unique, and 
the Fourier inversion formula tells us that 

A/(x)=f A?(5)e 27ri *-5d5. 

Jm d 

Therefore, 

A/(5) = (-47T 2 |g| 2 )/(g), 

a faet that can also be derived direetly from the def- 
inition of the Fourier transform using integration by 
parts. This identity shows that the Fourier transform 
diagonalizes the Laplacian: the operation of taking the 
Laplacian, when viewed using the Fourier transform, is 
nothing more than multiplication of a function F(g) by 
the multiplier -47T 2 |g| 2 .The quantity -4n- 2 |g| 2 canbe 
interpreted as the energy level associated 4 with the fre- 
quency g. In other words, the Laplacian can be viewed 
as a Fourier multiplier, meaning that to calculate the 
Laplacian you take the Fourier transform, muitipiy by 
the multiplier, and then take the inverse Fourier trans- 
form again. This viewpoint allows one to manipulate 
the Laplacian very easily. For instance, we can iterate 
the above formula to compute higher powers of the 
Laplacian: 

A »f(5) = (-4iT 2 |g| 2 ) n /(g) for n-o, 1,2 

Indeed, we are now in a position to develop more gen- 
eral functions of the Laplacian. For instance, we can 

3. Strictly speaking, this is a generalized eigenfunction, as plane 
waves are not square-integrable on R d . 

4. When taking this view, it is customary to replace A by - A in order 
to make the energies positive. 
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take a square root as follows: 

V-A/(5) = 2tt|5I/(§)- 

This leads to the theory of fractional differential oper- 
ators (which are in turn a special case of pseudodiffer- 
ential operators), as well as the more general theory 
of functional calculus [IV.19 §3.1], in which one 
starts with a given operator (such as the Laplacian) and 
then studies various functions of that operator, such 
as square roots, exponentials, inverses, and so forth. 

As the above discussion shows, the Fourier transform 
can be used to develop a number of interesting oper- 
ations, which have particular importance in the theory 
of differential equations. To analyze these operations 
effectively, one needs various estimates on the Fourier 
transform. For instance, it is often important to know 
how the size of a function /, as measured by some 
norm, relates to the size of its Fourier transform, as 
measured by a possibly different norm. For a further 
discussion of this point, see function spaces [III.29]. 
One particularly important and striking estimate of this 
type is the Plancherel identity, 

f \f(x)\ 2 dx=\ |/(g)| 2 dg, 

JR d Jx d 

which shows that the 1 2 -norm of a Fourier transform is 
actually equal to the 1.2-norm of the original function. 
The Fourier transform is therefore a unitary operation, 
so one can view the frequency-space representation of 
a function as being in some sense a “rotation” of the 
physical-space representation. 

Developing further estimates related to the Fourier 
transform and associated operators is a major compo- 
nent of harmonic analysis. A variant of the Plancherel 
identity is the convolution formula: 

f . f{y)g{x y)dy - [ /(§)ø(§)e 27Tlx ? dg. 

JR d JX d 

This formula allows one to analyze the convolution 
/ * g{x) = j ; .) f(y)g(x - y) dy of two functions f, g 
in terms of their Fourier transform; in particular, if 
the Fourier coefficients of / or g are small, then we 
expect the convolution f * g to be small as well. This 
relationship means that the Fourier transform Controls 
certain correlations of a function with itself and with 
other functions, which makes the Fourier transform 
an important tool in understanding the randomness 
and uniform distribution properties of various objects 
in probability theory, harmonic analysis, and number 
theory. For instance, one can pursue the above ideas 
to establish the central limit theorem, which asserts 
that the sum of many independent random variables 


will eventually resemble a Gaussian distribution (see 
probability distributions [III.73 §5]); one can even 
use such methods to establish vinogradov’s theo- 
rem [V.29], that every sufficiently large odd number is 
the sum of three primes. 

There are many directions in which to generalize the 
above set of ideas. For instance, one can replace the 
Laplacian by a more general operator and the plane 
waves by (generalized) eigenfunctions of that operator. 
This leads to the subject of spectral theory [III.88] 
and functional calculus; one can also study the alge- 
bra of Fourier multipliers (and of convolution) more 
abstractly, which leads to the theory of C* -algebras 
[IV.19 §3]. One can also go beyond the theory of lin- 
ear operators and study bilinear, multilinear, or even 
fully nonlinear operators. This leads in particular to 
the theory of paraproducts, which are generalizations 
of the pointwise product operation (f(x),g(x)) — 
fg(x) that are of importance in differential equations. 
In another direction, one can replace Euclidean space 
R d by a more general group, in which case the notion 
of a plane wave is replaced by the notion of a char- 
acter (if the group is Abelian) or a representation (if 
the group is non-Abelian). There are other variants of 
the Fourier transform, such as the Laplace transform 
or the Mellin transform (for more about other trans- 
forms, see the article transforms [III.93]), which are 
very similar algebraically to the Fourier transform and 
play similar roles (for instance, the Laplace transform is 
also useful in analyzing differential equations). We have 
already seen that Fourier transforms are connected to 
Taylor series; there is also a connection to some other 
important series expansions, notably Dirichlet series, 
as well as expansions of functions in terms of special 
polynomials [III.87] such as orthogonal polynomials 
or SPHERICAL HARMONICS [III.89]. 

The Fourier transform decomposes a function ex- 
actly into many components, each of which has a pre- 
cise frequency. In some applications it is more use- 
ful to adopt a “fuzzier” approach, in which a func- 
tion is decomposed into fewer components but each 
component has a range of frequencies rather than con- 
sisting purely of a single frequency. Such decomposi- 
tions can have the advantage of being less constrained 
by the uncertainty principle, which asserts that it is 
impossible for both a function and its Fourier trans- 
form to be concentrated in very small regions of R d . 
This leads to some variants of the Fourier transform, 
such as wave let transforms [VII. 3], which are bet- 
ter suited to a number of problems in applied and 
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computational mathematics, and also to certain ques- 
tions in harmonic analysis and differential equations. 
The uncertainty principle, being fundamental to quan- 
tum mechanics, also connects the Fourier transform to 
mathematical physics, and in particular to the connec- 
tions between classical and quantum physics, which 
can be studied rigorously using the methods of geo- 
metric quantization and microlocal analysis. 


III.28 Fuchsian Groups 

Jeremy Gray 


One of the most basic objects in geometry is the torus: 
a surface that has the shape of the surface of a bagel. 
If you want to construct one, you can do so by tak- 
ing a square and gluing opposite edges together. When 
you glue the top and bottom edges together you have 
a cylinder, and when you glue the other two edges 
together, which have now become circles, you obtain 
your torus. 

A more mathematical way of making a torus is as fol- 
lows. We start with the usual (x,y ) coordinate plane 
and the square in it with vertices at (0,0), (1,0), (1,1), 
and (0, 1), which consists of the points whose coordin- 
ates satisfy 0<x<l,0<y<l. This square can be 
moved around horizontally and vertically. If we shift it 
m units horizontally and n units vertically, where m 
and n are integers, we get the square that consists of 
the points whose coordinates satisfy m < x < m + 1, 
n<y^n+l.Asm and n run through all the integers, 
we see that the copies of the square cover the whole 
plane, with four squares coming together at each point 
with integer coordinates. The plane is said to be tiled 
or tessellated (from the Latin word for a marble chip in 
a mosaic), and it is easy to see that you can color the 
squares alternately black and white and get an infinite 
checkerboard pattern. 

To make the torus we “identify” points. We say that 
the points ( x,y ) and (x',y') correspond to the same 
point in a certain new figure if x - x' and y — y' are 
both integers. To see what the new figure looks like, 
we observe that any point in the plane corresponds to 
a point inside, or on the edge of, our original square. 
Moreover, the point (x,y) corresponds to exactly one 
point inside the square provided that neither x nor y 
is an integer. So our new space looks a lot like our origi- 
nal square. But what about the points ( \ , 0) and (3,1)? 
They correspond to the same point in our new space, as 
do any corresponding pairs of points on the upper and 
lower edges of our square. So those edges are identified 


in our new space. By a similar argument, so too are the 
left and right edges. The result is that, after points are 
identified according to our rule, we obtain the torus. 

If we make the torus in this way, we can draw small 
figures on it just by drawing them in the original square; 
lengths in the square will then correspond exactly to 
lengths on the torus. This is how old-fashioned print- 
ing on a drum works: an inked figure on a cylinder is 
rolied over the paper to make exact copies of the figure. 
Thus, as far as small figures are concerned, the geom- 
etry of the torus is exactly like Euclidean geometry. In 
mathematical language we say that the geometry on the 
torus is induced from the geometry on the plane, and 
therefore that it is locally Euclidean. Globally, of course, 
it is different, because one can draw curves on the torus 
that cannot be shrunk to a point, whereas one cannot 
do so on the plane. 

Notice, too, that we have brought in a group to do 
the bulk of the work for us. In this case the group is 
the set of all pairs (nt, n) where m and n are integers, 
with (m,n) + ( m',n ') defined to be ( m + m' ,n + n '). 

The torus and the sphere are but two of an infinite 
class of surfaces that are closed (they have no bound- 
ary) and compact (they do not in any sense go off to 
infinity). Other surfaces include the two-holed torus, 
and more generally the n-holed torus (the surfaces of 
genus 2,3,4,... ). To create these in a similar way, we 
need Fuchsian groups. 

It is natural to expect that we can get other sur- 
faces by using polygons with more than four sides. It 
turns out that if you use a polygon with eight sides, 
for example a regular octagon, and glue sides 1 and 3 
together, 2 and 4 together, 5 and 7 together, and 6 and 
8 together, you get the two-holed torus. How can we use 
a group to achieve the same result, as we did with the 
torus? For that we need a way of fitting lots of copies 
of the octagon together so that they overlap only along 
edges. The problem is that one cannot tile the plane 
with octagons: the angles of an octagon are 135°, and 
that is far too big because we need eight octagons to fit 
together at each vertex. 

The way forward here is to use hyperbolic geom- 
etry [1.3 §6.6] instead of Euclidean geometry. But we 
can also work with our bare hånds. Take the unit disk 
in the complex plane, D = {z : \z\ < 1}. Take the 
group of what are called Mobius transformations, which 
are maps of the form z -» ( az + b)/(cz + d). It is a 
routine calculation to show that these maps send cir- 
cles and straight lines to circles and straight lines (they 
mix the two types up, sometimes sending a circle to a 
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straight line and vice versa) and that they map angles 
to equal but opposite angles, just like the more famil- 
iar Euclidean reflections. If we now select just those 
Mobius transformations that map B to itself, then we 
have a group that we shall call G. Indeed, we very nearly 
have a Fuchsian group. 

We need to find a shape that will play the role that 
the square played in the Euclidean plane. Our group 
G has the property that it maps diameters of B and 
ares of circles perpendicular to the boundary of B to 
diameters of B and ares of circles perpendicular to 
the boundary of B, so we let these play the role of 
straight lines and use eight of them as the edges of 
a (non-Euclidean) octagon. We find that we can do this 
in many ways, so we pick one with the highest degree 
of symmetry to make things easy for ourselves. That 
is, we draw a “regular octagon” centered on the center 
of the disk B. This still leaves us with some choice: the 
bigger the octagon, the smaller its angles. So we draw 
the octagon with angles of tt/4, which allows eight of 
them to cluster at each vertex, and then we can fit them 
together as we want. If we identify points that lie in cor- 
responding places in different copies of the polygon, 
then the resulting space is a riemann surface [III.81] 
of genus 2. 

A Fuchsian group is a subgroup of the group G 
(of Mobius transformations that map B to itself) that 
moves some polygon around “en bloc” and thereby tiles 
the disk. Just as with the torus, we have a notion of 
equivalent points (ones that are in the corresponding 
place in different tiles) and when we identify equiva- 
lent points we get the space that we would also have 
obtained by identifying the edges of the polygon in 
pairs, which is the space we wanted. 

All this can be described in the language of hyper- 
bolic geometry. The disk model is defined by means of 
a riemannian metric [1.3 §6.10] on B, the differential 
of which is given by 



The elements of G move figures around in B in a way 
that preserves hyperbolic distances. It follows that the 
geometry on the surface that we obtain by identifying 
points in the manner just described is locally hyper- 
bolic, just as that of the torus was locally Euclidean. 

It turns out that if we carry out the above construc- 
tion starting with a regular 4n-sided figure (with n > 2), 
then we obtain a Riemann surface of genus n. But math- 
ematicians can do mueh more. If you go back to the 


plane and start not with a square but with a rectan- 
gle, or still more generally a parallelogram, it is rea- 
sonably easy to see that the same construction can be 
carried out. Indeed, if you just watch the original con- 
struction from an appropriate angle, instead of from 
vertically above the plane, then the square will turn into 
any parallelogram you choose (possibly enlarged or 
contracted). When you use a parallelogram, you again 
obtain a torus, but it differs from the original one in the 
same way that the square and the parallelogram differ: 
angles are distorted. It is a not entirely trivial exercise 
to show that the only angle-preserving maps from one 
parallelogram to another are similarities (uniform scal- 
ing by the same amount in two, and therefore all, direc- 
tions). So the resulting tori have a different sense of 
what angles are: that is, they have different conformal 
structures. 

The same happens in the hyperbolic disk. If one picks 
a 4n-sided polygon (its sides are parts of geodesics) 
whose edges come in pairs of equal length, and one 
finds a group that moves this polygon around en bloc 
and matches the edges exaetly, then a Riemann sur- 
face is once again obtained, but if the polygons are 
not conformally equivalent, then neither are the cor- 
responding surfaces; they have the same genus, n, but 
different conformal structures. We can even go further 
and allow some of the vertices of the polygon to lie 
on the boundary of the disk, in which case the corre- 
sponding sides of the polygon are infinitely long with 
respect to the hyperbolic metric. The space we then 
construct is a “punetured” Riemann surface, and again 
mathematicians can vary its conformal structure. 

The fundamental importance of Fuchsian groups 
derives from the uniformization theorem, which says 
that all but the simplest Riemann surfaces arise from 
some Fuchsian group in the fashion described above. 
This includes every Riemann surface of genus greater 
than 1, and those of genus 1 with at least one puneture, 
with any possible conformal structure. 

The name Fuchsian group was given to these groups 
by POiNCARÉ [VI.61] in 1881, who discovered them in 
the course of work on the hypergeometric equation and 
related differential equations, which had been inspired 
by the work of the German mathematician Fazarus 
Fuchs. klein [VI. 5 7] protested to him that a better pro- 
cedure might have been to name them after Schwarz, 
and Poincaré was willing to agree once he read the rele- 
vant paper by Schwarz, but by then Fuchs had given his 
approval to the name. When Klein protested too mueh 
(in Poincaré's view), Poincaré publiely gave the name 
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Kleinicm groups to the analogous class of groups that 
arise in the study of conformal transformations of the 
three-dimensional unit hall. The names have stuck ever 
since, but the study of Kleinian groups is much more 
difficult and neither Poincaré nor Klein could do much 
with the concept. However, the idea that every Riemann 
surface might arise from either the sphere, the Euclid- 
ean plane, or the hyperbolic plane was something they 
both came to conjecture. Rigorous proofs of this state- 
ment, the uniformization theorem, were to be given 
only in 1907, by Poincaré and Koebe independently. 

The formal definition of a Fuchsian group is as fol- 
lows. A subgroup H of the group of all Mobius transfor- 
mations is said to act discontinuously if, for every com- 
pact set K in the disk o the sets h(K) and K are disjoint 
except for finitely many h e H. A Fuchsian group is a 
subgroup H of the group of all Mobius transformations 
that acts discontinuously on the disk B. 


III.29 Function Spaces 

Terence Tao 


1 What Is a Function Space? 

When one works with real or complex numbers, there 
is a natural notion of the magnitude of a number x, 
namely its modulus \x\. One can also use this notion 
of magnitude to define a distance | x — y | between two 
numbers x and y and thereby say in a quantitative way 
which pairs of numbers are close and which ones are 
far apart. 

The situation becomes more complicated, however, 
when one deals with objects with more degrees of 
freedom. Consider for instance the problem of deter- 
mining the “magnitude” of a three-dimensional rect- 
angular box. There are several candidates for such a 
magnitude: length, width, height, volume, surface area, 
diameter (the length of a long diagonal), eccentric- 
ity, and so forth. Unfortunately, these magnitudes do 
not give equivalent comparisons: for example, box A 
may be longer and have a greater volume than box B, 
but box B may be wider and have a greater surface 
area. Because of this, one abandons the idea that there 
should be only one notion of “magnitude” for boxes, 
and instead accepts that there is a multiplicity of such 
notions and that they can all be useful: for some appli- 
cations one may wish to distinguish the large-volume 
boxes from the small-volume boxes, while in others one 
may wish to distinguish the eccentric boxes from the 
round boxes. Of course, there are several relationships 


between the different notions of magnitude (e.g., the 
isoperimetric inequality [IV.24] allows one to place 
an upper limit on the possible volume if one knows the 
surface area), so the situation is not as disorganized as 
it may at first appear. 

Now let us turn to functions with a fixed domain 
and range. (A good case to have in mind is functions 
f : [-1,1] — ■ R from the interval [—1,1] to the real line 
R.) These objects have infinitely many degrees of free- 
dom, so it should not be surprising that there are now 
infinitely many distinet notions of “magnitude,” which 
all provide different answers to the question “how large 
is a given function /?” (or to the closely related ques- 
tion “how close together are two functions / and gT’). 
In some cases, certain functions may have infinite mag- 
nitude by one measure and finite magnitude by another 
(similarly, a pair of functions may be very close by one 
measure and very far apart by another). Again, this 
situation may seem chaotic, but it simply reflects the 
faet that functions have many distinet characteristics — 
some are tall, some are broad, some are smooth, some 
are oscillatory, and so forth— and that, depending on 
the application at hånd, one may need to give more 
weight to one of these characteristics than to others. In 
analysis, these characteristics are embodied in a vari- 
ety of standard function spaces and their associated 
norms, which are available to describe functions both 
qualitatively and quantitatively. 

Formally, a function space is a normed space [III.64] 
X, the elements of which are functions (with some fixed 
domain and range). A majority (but certainly not all) 
of the standard function spaces considered in analysis 
are not just normed spaces but also banach spaces 
[III.64]. The norm \\f\\x of a function / in X is the func- 
tion space's way of measuring how large / is. It is com- 
mon, though not universal, for the norm to be defined 
by a simple formula and for the space X to consist pre- 
cisely of those functions / for which the resulting def- 
inition ||/1|* makes sense and is finite. Thus, the mere 
faet that a function / belongs to a function space X can 
already convey some qualitative information about that 
function. For example, it may imply some regularity, 1 
decay, boundedness, or integrability on the function /. 
The actual value of the norm ||/||* makes this informa- 
tion quantitative. It may tell us how regular / is, how 
much decay it has, by which constant it is bounded, or 
how large its integral is. 


1. The more smoothly a function varies, the more “regular” it is 
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2 Examples of Function Spaces 


2.3 The Lebesgue Spaces L p [-1, 1] 


We now present a sample of commonly used function 
spaces. For simplicity we shall consider only spaces of 
functions from [-1, 1] to R. 

2.1 C°[-l, 1] 

This is the space of all continuous functions 
[1.3 §5.2] from [-1, 1] to R, and is sometimes denoted 
C[— 1,1]- Continuous functions are regular enough to 
allow one to avoid many of the technical subtleties 
associated with very rough functions. Continuous func- 
tions on a compact [III.9] interval such as [-1,1] are 
bounded, so the most natural norm to place on this 
space is the supremum norm, denoted ||/IL, which 
is the largest possible value of |/(x)|. (Formally, it is 
defined to be sup{|/(x)| :x e [-1,1]}, but for con- 
tinuous functions on [-1,1] the two definitions are 
equivalent.) 

The supremum norm is the norm associated with uni- 
form convergence: a sequence /i, />,... converges uni- 
formly to / if and only if ||/ n - /IL tends to 0 as n 
tends to oo. The space C°[— 1,1] has the useful prop- 
erty that one can multiply functions together as well as 
adding them. This makes it a basic example of a Banach 
algebra. 

2.2 cH-1,1] 

This is a space that has a more restricted member- 
ship than C°[— 1,1]: not only must a function / in 
C 1 [ - 1 , 1 ] be continuous but it must also have a deriva- 
tive that is continuous. The supremum norm here is 
no longer a natural one, because a sequence of con- 
tinuously differentiable functions can converge in this 
norm to a nondifferentiable function. Instead, the right 
norm here is the C 1 -norm ll/llcq-i.i]. which is defined 
to be ll/IL + ||/' IL- 

Notice that the C 1 -norm measures both the size of 
a function and the size of its derivative. (Merely con- 
trolling the latter would be unsatisfactory, since it 
would give constant functions a norm of zero.) Thus 
it is a norm that forces a greater degree of regular- 
ity than the supremum norm. One can similarly define 
the space C 2 [-1,1] of twice continuously differen- 
tiable functions, and so forth, all the way up to the 
space C 00 [—1,1] of infinitely differentiable functions. 
(There are also “fractional” versions of these spaces, 
such as C 0, “[- 1, 1], the space of a-Holder continuous 
functions. We will not discuss these variants here.) 


The supremum norm ||/|L mentioned earlier gives 
simultaneous control on the size of |/(x) for all x e 
[-1, 1]. However, this means that if there is a tiny set 
of x for which |/(x)| is very large, then ||/|L is very 
large, even if a typical value of |/(x) | is much smaller. 
It is sometimes more advantageous to work with norms 
that are less influenced by the values of a function on 
small sets. The L p -norm of a function / is 

m P = [\\\fM\ p åx^' P . 

This is defined for 1 ^ p < oo and for any measurable /. 
The function space I p [ - 1 , 1 ] is the class of measurable 
functions for which the above norm is finite. The norm 
ll/IL of a measurable function / is its essential supre- 
mum: roughly speaking this means the largest value of 
|/(x)| if you ignore sets of measure zero. It turns out 
to be the limit of the norms ||/|| p as p tends to infin- 
ity. The space L“[-l, 1] consists of those measurable 
functions / for which ll/IL is finite. While the L°° norm 
is concerned solely with the “height” of a function, the 
L p norms are instead concerned with a combination of 
the “height” and “width” of a function. 

Particularly important among these norms is the 
I 2 -norm, since I 2 [-l,l] is a hilbert space [III. 3 7]. 
This space is exceptionally rich in symmetries: there 
is a wide variety of unitary transformations, that is, 
invertible linear maps T defined on I 2 [-1, 1] such that 
lir/|| 2 = II/II2 for every function / e i 2 [— 1, 1]. 


2.4 The Sobolev Spaces W k,p [-1, 1] 


The Lebesgue norms control, to some extern, the height 
and width of a function, but say nothing about regu- 
larity; there is no reason why a function in L p should 
be differentiable or even continuous. To incorporate 
such information one often turns to the Sobolev norms 
H/llwrt,?[- 1 ,i], defined for 1 ^ p ^ 00 and k ^ 0 by 


ll/lln 


-U.-ZIISI 


The Sobolev space W k ’ p [-1, 1] is the space of functions 
for which this norm is finite. Thus, a function lies in 
W k ’ p [- 1, 1] if it and its first k derivatives all belong to 
L p [- 1, 1]. There is one subtlety: we do not require / to 
be k times differentiable in the usual sense, but in the 
weaker sense of distributions [III.18]. For instance, 
the function /(x) = |x| is not differentiable at zero, 
but it does have a natural weak derivative: the function 
/'(x) which is -1 when x < 0 and +1 when x > 0. 
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This function lies in I“[-l, 1] (since the set {0} has 
measure zero, we do not need to specify /'(O)), and 
therefore / lies in W x '°°[-1, 1] (which turns out to be 
the space of Lipschitz-continuous functions). We need 
to consider these generalized differentiable functions 
because without them the space W k ’ p [- 1, 1] would not 
be complete. 

Sobolev norms are particularly natural and useful 
in the analytical study of partial differential equations 
and mathematical physics. For instance, the W 1,2 norm 
can be interpreted as (the square root of) an “energy” 
associated with a function. 

3 Properties of Function Spaces 

There are many ways in which knowledge of the struc- 
ture of function spaces can assist in the study of func- 
tions. For instance, if one has a good basis for the func- 
tion space, so that every function in the space is a (pos- 
sibly infinite) linear combination of basis elements, and 
one has some quantitative estimates on how this linear 
combination converges to the original function, then 
this allows one to represent that function efficiently in 
terms of a number of coefficients, and also allows one 
to approximate that function by smoother functions. 
For instance, one basic result about I 2 [-1,1] is the 
Plancherel theorem, which asserts, among other things, 
that there are numbers (dn)n=-~ such that 

il N il 

|| /- X «ne Trinx || 2 — 0 as N ^ oo. 

This shows that any function in i 2 [-1,1] can be 
approximated to any desired accuracy in I 2 by a 
trigonometric polynomial: that is, an expression of the 
form X n=-N a, n e ninx . The number a n is the nth Fourier 
coefficient f (n) of /. It is given by the formula 

/(n) = || x f(x)e~ ninx dx. 

One can regard this result as saying that the func- 
tions e ninx form a very good basis for I 2 [ - 1 , 1 ] . (They 
are in faet an orthonormal basis: they have norm 1 and 
the inner product of two different ones is always zero.) 

Another very basic faet about function spaces is 
that certain function spaces embed into others, so 
that a function from one space automatically also 
belongs to other spaces. Furthermore, there is often 
some inequality that gives an upper bound for one 
norm in terms of another. For instance, a function in a 
high-regularity space such as C 1 [— 1, 1] automatically 
belongs to a low-regularity space such as C° [ — 1 , 1 ] , 


and a function in a high-integrability space such as 
£“[-1,1] automatically belongs to a low-integrability 
space such as lJ[ 1, 1]. (This statement is no longer 
true if one replaces the interval [-1,1] by a set of infi- 
nite measure, such as the real line R.) These inclusions 
cannot be reversed; however, one does have the Sobolev 
embedding theorem, which allows one to “trade” regu- 
larity for integrability. This result tells us that spaces 
with lots of regularity but low integrability can be 
embedded into spaces with low regularity but high 
integrability. A sample estimate of this type is 
ll/lloo ^ II/tkut-1,1], 

which tells us that if the integrals of \f(x) | and | /' (x) | 
are both finite, then / must be bounded (which is a 
far stronger integrability condition than the fmiteness 
of WfWi). 

Another very useful concept is that of duality 
[III. 19]. Given a function space X, one can define the 
dual space X* , which is formally defined as the class 
of all continuous linear functionals on X, or more pre- 
cisely all maps co : X — R (or co : X — C, if the function 
space is complex valued) that are linear and continuous 
with respect to the norm of X. For example, it turns out 
that every linear funetional co on the space L p [- 1, 1] 
is of the form 

co(/) = | ^f(x)g(x) dx 

for some function g in I«[-l, 1], where q is the dual 
or conjugate exponent of p, defined by the equation 
l/p + l/q = 1. 

One can sometimes analyze functions in a function 
space by looking instead at how the linear function- 
als in the dual space act on those functions. Similarly, 
one can often analyze a continuous linear operator 
T : X -> Y from one function space to another by 
first considering the adjoint operator T* : Y* — X*, 
defined for all linear functionals co : Y — ■ R by letting 
T* co be the funetional on X defined by the formula 
T*co(x) = co(Tx). 

We mention one more important faet about func- 
tion spaces, which is that certain function spaces X 
“interpolate” between two other function spaces Wj and 
Xi. For example, there is a natural sense in which the 
spaces L p [-l,l] with 1 < p < oo “lie between” the 
spaces I . 1 [-1, 1] and L°°[-l, 1]. The precise definition 
of interpolation is too technical for this article, but its 
usefulness lies in the faet that the “extreme” spaces Xq 
and Xi are often easier to deal with than the “inter- 
mediate” spaces X. For this reason, it is sometimes 
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possible to prove difficult results about X by proving 
mueh easier results about Xo and X\ and “interpolat- 
ing” between them. For instance, it can be used to give 
a short proof of Young’s inequality, which is the follow- 
ing statement. Let 1 < p, q,r < oo satisfy the equation 
1 lp + l/q = l/r + 1, let / and g belong to IP(R) and 
(R) , respectively, and let f*g be the convolution of / 
and g: that is, / * g(x) = J!°„ f{y)g(x - y) dy. Then 

(!-«, 1 / * d *) 

< (1°! l/(x)|Pdx ) 1/P (l°l 

Interpolation is useful here because the inequality is 
easy to prove in the extreme cases when p = 1, when 
q = 1, or when r = oo. It is mueh harder to prove this 
result without the help of interpolation theory. 


III.30 Galois Groups 


Given a polynomial funetion /, the splitting field of / 
is defined to be the smallest field [1.3 §2.2] that con- 
tains all rational numbers and all the roots of /. The 
Galois group of / is the group of all automorphisms 
[1.3 §4.1] of the splitting Held. Each such automorphism 
permutes the roots of /, so the Galois group can be 
thought of as a subset of the group of all permuta- 
tions [III. 70] of these roots. The structure and proper- 
ties of the Galois group are closely connected with the 
solubility of the polynomial: in particular, the Galois 
group can be used to show that not all polynomials 
are solvable by radicals (that is, solvable by means of a 
formula that involves the usual arithmetic operations 
together with the extraction of roots). This theorem, 
spectacular as it is, is by no means the only application 
of Galois groups: they play a central role in modern 
algebraic number theory. 

For more details, see the insolubility of the quin- 
tic [V.24] and algebraic numbers [IV.3 §20]. 


III. 31 The Gamma Funetion 

Ben Green 


If n is a positive integer, then its factorial, written n!, is 
the number 1 x 2 x ■ ■ ■ x n: that is, the product of all 
positive integers up to n. For example, the first eight 
factorials are 1, 2, 6, 24, 120, 720, 5040, and 40 320. 
(The exelamation mark was introduced by Christian 
Kramp 200 years ago as a convenience to the printer: 
it is perhaps also intended to convey some alarm at 


the rapidity with which n! grows. An obsolete nota- 
tion, which can still be found in some twentieth-century 
texts, is [n .) From this definition, it might appear to be 
impossible to make sense of the idea of the factorial of 
a number that is not a positive integer, but, as it turns 
out, it is not just possible to do so, but also extremely 
useful. 

The gamma funetion, written T, is a funetion that 
agrees with the factorial funetion at positive integer 
values, but that makes sense for any real number, and 
even for any complex number. Actually, for various rea- 
sons it is natural to define T so that T (n) = (n - 1 ) ! for 
n = 2, 3, Let us start by writing 

T(s) = f x s ~ l e~ x dx, (1) 

Jo 

without paying too mueh attention to whether the inte- 
gral converges. If we integrate by parts, then we find 
that 

T(5) = [-x^e-*];? + J”(s- l)x s - 2 e~ x dx. (2) 

As x tends to infinity, x s ~ 1 e~ x tends to zero, and if 
5 is, for example, a real number greater than 1, then 
x 5_1 = 0 when x = 0. Therefore, for such s, we can 
ignore the first term in the above expression. But the 
second one is simply the formula for r(s — 1), so we 
have shown that T(s) = (s - l)T(s - 1), which is just 
what we need if we want to think of F (s) as something 
like {s - 1)!. 

It is not hard to show that the integral is in faet con- 
vergent whenever 5 is a complex number and Re(5) (the 
real part of s) is positive. Moreover, it defines a holo- 
morphic function [1.3 §5.6] in that region. When the 
real part of 5 is negative, the integral does not converge 
at all, and so the formula (1) cannot be used to define 
the gamma funetion in its entirety. However, we can 
insteaduse the propertyT (5) = (5- 1 )T(s- 1) to extend 
the definition. For example, when -1 < Re(s) < 0, we 
know that the definition does not work direetly, but it 
does work for s + 1, since Re(5 + 1) >0. We would like 
T(5 + 1) to equal sr(s), so it makes sense to define F (5) 
to be r(s + 1) /s. Once we have done this, we can turn 
our attention to values of 5 with -2 < Re(s) < -1, and 
so on. 

The reader may object that in defming T(0) (for 
example), we have divided by zero. This is perfeetly 
permissible, however, if all we require of T is that it 
should be meromorphic [V.34], because meromorphic 
funetions are allowed to take the “value” 00. Indeed, it is 
not hard to see that T, as we have defined it, has simple 
poles at 0,-1, -2, 
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There are in faet many funetions that share the use- 
ful properties of f. (For instance, because cos(2tts) = 
cos(2tt(5 + 1)) for any s, and cos(2Trn) = 1 for every 
integer n, the funetion F(s) = E (s) cos(2tts) also has 
the property F(s) = (s l)F(s- 1) and F(n) = (n-1)!.) 
Nevertheless, for a variety of reasons, the funetion r, 
as we have defined it, is the most natural meromorphic 
extension of the factorial funetion. The most persua- 
sive reason is the faet that it arises so often in natural 
contexts, but it is also, in a certain sense, the smoothest 
interpolation of the factorial funetion to all positive 
real values. In faet, if / : (0, oo) -* (0, oo) is such that 
/(x + 1) = x/(x), /( 1) = 1, and log/ is convex, then 

f = r. 

There are many interesting formulas involving T, 
such as r(5)T(l - 5 ) = 7r/sin(TT5). There is also the 
famous result T( 5 ) = /ff, which is essentially equiva- 
lent to the faet that the area under the “normal distri- 
bution curve” h(x) = (l//2rr)e -x2/2 is 1 (this can be 
seen by making the substitution x = u 2 / 2 in (1)). A 
very important result concerning T is the Weierstrass 
product expansion, which States that 


— =ze^n 1+- e- 


for all complex z, where y is Euler’s constant: 
y = lim (i + ^ + ■ ■ ■ + ^ - logn). 

This formula makes it clear that T never vanishes, and 
that it has simple poles at 0 and the negative integers. 

Why is the gamma funetion important? Reason 
enough is its frequent occurrence in many parts of 
mathematics, but one can attempt to explain why this 
should be so. One reason is that T, as defined in (1), is 
the Mellin transform of the unarguably natural fune- 
tion /(x) = e~ x . The Mellin transform is a type of 
fourier transform [III.27], but it is defined for fune- 
tions on the group (R + , x) rather than (R, +) (which is 
the habitat of the most familiar type of Fourier trans- 
form). For this reason, T is often seeninnumber theory, 
particularly analytic number theory [IV.4], where 
multiplicatively defined funetions are often studied by 
taking Fourier transforms. 

One appearance of T in a number-theoretical con- 
text is in the funetional equation for the riemann zeta 
function [IV.4 §3], namely, 


where 

S(5) =T(5/2)tt-^ 2 C(5). (3) 


The £ funetion has a well-known product representa- 
tion 

Z(.s) = Y\(i-p~ s r\ 

where the product is over primes and the representa- 
tionis valid for Re(s) > 1. The extra factor T(s/2)tt~ 5/2 
in (3) may be regarded as coming from the “prime at 
infinity” (a term which may be rigorously defined). 

Stirling’s formula is a very useful tool in dealing with 
the gamma funetion: it provides a rather accurate esti- 
mate for T(z) in terms of simpler funetions. A very 
rough (but often useful) approximation for n! is (n/e) n , 
which tells us that log(n!) is about nflog n - 1). Stir- 
ling’s formula is a sharper version of this erude esti- 
mate. Let 6 > 0 and suppose that z is a complex num- 
ber that has modulus at least 1 and argument between 
-tt + 5 and tt - S. (This second condition keeps z away 
from the negative real axis, where the poles are.) Then 
Stirling’s formula States that 

lOgT(z) = (Z- i)l0gZ-Z+ ilOg2TT + £, 
where the error E is at most C(<5)/|z|. Here, C (5) 
stands for a certain positive real number that depends 
on 6. (The smaller you make 5, the larger you have to 
make C (5).) Using this, one may confirm that T decays 
exponentially as Imz -> oo in any fixed vertical strip in 
the complex plane. In faet, if a < a < /?, then 
|T(cr + it)| ^ C(a,)S)|t|^ _1 e _Tr|t|/2 
for all |t| > 1, uniformly in cr. 


IH. 3 2 Generating Funetions 


Suppose that you have defined a combinatorial struc- 
ture, and for each nonnegative integer n you wish to 
understand how many examples of this structure there 
are of size n. If a n denotes this number, then the 
object that you are trying to analyze is the sequence 

ao, ai, a2, a-$, If the structure is quite complicated, 

then this may be a very hard problem, but one can 
sometimes make it easier by considering a different 
object, the generating funetion of the sequence, which 
contains the same information. 

To define this funetion, one simply regards the 
sequence a n as the sequence of coefficients in a 
power series. That is, the generating funetion / of the 
sequence is given by the formula 

fix) = ao + aix + aix 2 + a^x 3 + ■ ■ ■ . 

The reason this can be useful is that one can some- 
times derive a succinct expression for / and analyze it 
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without reference to the individual numbers a„. For 
example, one important generating function has the 
formula f(x ) = (1 - Vi - 4x)/2x. In such cases, one 
can deduce properties of the sequence ao, ai,ci2,--- 
from properties of /, rather than the other way round. 

For more on generating functions, see enumerative 
AND ALGEBRAIC COMBINATORICS [IV.22] and TRANS- 
FORMS [III.93]. 


III. 3 3 Genus 


The genus is a topological invariant of surfaces: that 
is, a quantity associated with a surface that does not 
change when the surface is continuously deformed. 
Roughly speaking, it corresponds to the number of 
holes of that surface, so a sphere has genus 0, a torus 
has genus 1, a pretzel shape (that is, the surface of a 
blown-up figure of eight) has genus 2, and so on. If one 
triangulates an orientable surface and counts the num- 
ber of vertices, edges, and faces in the triangulation, 
denoting them V, E, and F, respectively, then the Euler 
characteristic is defined to be V - E + F. It can be shown 
that if g is the genus and x is the Euler characteristic, 
then x = 2 - 2 g. See [1.4 §2.2] for a fuller discussion. 

A famous result of poincaré [VI.61] States that for 
every nonnegative integer g there is precisely one ori- 
entable surface of genus g. (Moreover, genus can also 
be defined for nonorientable surfaces, where a similar 
result holds.) See differential topology [IV.9 §2.3] 
for more about this theorem. 

One can associate an orientable surface, and there- 
fore a genus, with a smooth algebraic curve. An ellip- 
tic curve [III.21] can be defined as a smooth curve of 
genus 1. See algebraic geometry [IV. 7 §10] for more 
details. 


III.34 Graphs 


A graph is one of the simplest of all mathematical struc- 
tures: it consists of some elements called vertices (of 
which there are usually just finitely many), some pairs 
of which are deemed to be “joined” or “adjacent.” It is 
customary to represent the vertices by points in a plane 
and to join adjacent points by a line. The line is ref erred 
to as an edge (though how the line is drawn or visual- 
ized is irrelevant: all that is important is whether or not 
two points are joined). 

For example, the rail network of a country canbe rep- 
resented by a graph: we can use vertices to represent 


the stations, and we can join two vertices if they repre- 
sent consecutive stations along some rail line . Another Que 
example Another example is provided by the Internet: Ame 
the vertices are all the world's computers, and two are 
adjacent if there is a direct link between them. 

Many questions in graph theory take the form of ask- 
ing what some structural property of graphs can tell 
you about its other properties. For example, suppose 
that we are trying to find a graph with n vertices that 
does not contain a triangle (defined to be a set of three 
vertices that are mutually joined). How many edges can 
the graph have? Clearly \n 2 is possible, at least if n is 
even, since one can then divide up the n vertices into 
two equal classes and join all vertices in one class to all 
vertices in the other. But can there be more edges than 
that? 

Here is another example of a typical question about 
graphs. Let k be a positive integer. Must there exist an 
n such that every graph with n vertices always contains 
either k vertices that are all joined to each other or k 
vertices none of which are joined to each other? This 
question is quite easy for k = 3 (where n = 6 suffices), 
but already for k = 4 it is not obvious that such an n 
exists. 

For more on these problems (the first is the found- 
ing problem of “extremal graph theory,” while the sec- 
ond is the founding problem of “Ramsey theory”) and 
on the study of graphs in general, see extremal and 
PROBABILISTIC COMBINATORICS [IV.23]. 


III. 3 5 Hamiltonians 

Terence Tao 


At first glance, the many theories and equations of 
modern physics exhibit a bewildering diversity: com- 
pare, for instance, classical mechanics with quan- 
tum mechanics, nonrelativistic physics with relativistic 
physics, or particle physics with statistical mechanics. 
However, there are strong unifying themes connecting 
all of these theories. One of these is the remarkable 
faet that in all of them the evolution of a physical sys- 
tem over time (as well as the steady States of that sys- 
tem) is largely controlled by a single object, the Hamil- 
tonian of that system, which can often be interpreted 
as describing the total energy of any given State in that 
system. Roughly speaking, each physical phenomenon 
(e.g., electromagnetism, atomic bonding, particles in a 
potential well, etc.) may correspond to a single Hamil- 
tonian ff, while each type of mechanics (classical, quan- 
tum, statistical, etc.) corresponds to a different way 
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of using that Hamiltonian to describe a physical sys- 
tem. For instance, in classical physics, the Hamiltonian 
is a function (q,p) ■— H(q,p ) of the positions q and 
momenta p of the system, which then evolve according 
to Hamilton’s equations: 

dq _dH_ d p _ _dH_ 
dt dp ’ dt dq' 

In (nonrelativistic) quantum mechanics, the Hamilto- 
nian H becomes a linear operator [III. 5 2] (which 
is often a formal combination of the position opera- 
tors q and momenta operators p), and the wave func- 
tion tp of the system then evolves according to the 
SCHRODINGER EQUATION [111.85]: 

d 

m—qj = Hv- 
in statistical mechanics, the Hamiltonian H is a func- 
tion of the microscopic State (or microstaté) of a system, 
and the probability that a system at a given tempera- 
ture T will lie in a given microstaté is proportional to 
e -H/fcr A nc j so on anc i so forth. 

Many helds of mathematics are closely intertwined 
with their counterparts in physics, and so it is not sur- 
prising that the concept of a Hamiltonian also appears 
in pure mathematics. For instance, motivated by clas- 
sical physics, Hamiltonians (as well as generalizations 
of Hamiltonians, such as moment maps) play a major 
role in dynamical systems, differential equations, Lie 
group theory, and symplectic geometry. Motivated by 
quantum mechanics, Hamiltonians (as well as gener- 
alizations, such as observables or pseudo-differential 
operators) are similarly prominent in operator alge- 
bras, spectral theory, representation theory, differen- 
tial equations, and microlocal analysis. 

Because of their presence in so many areas of physics 
and mathematics, Hamiltonians are useful for budd- 
ing bridges between seemingly unrelated helds: for 
instance, between classical mechanics and quantum 
mechanics, or between symplectic mechanics and oper- 
ator algebras. The properties of a given Hamiltonian 
often reveal much about the physical or mathematical 
objects associated with that Hamiltonian. For example, 
the symmetries of a Hamiltonian often induce corre- 
sponding symmetries in objects described using that 
Hamiltonian. While not every interesting feature of a 
mathematical or physical object canbe read off directly 
from its Hamiltonian, this concept is still fundamental 
to understanding the properties and behavior of such 
objects. 


See also vertex operator algebras [IV.13 §2.1], 
MIRROR SYMMETRY [IV.14 §§2.1.3, 2.2.1], and SYMPLEC- 
TIC MANIFOLDS [III.90 §2.1]. 


III. 3 6 The Heat Equation 

Igor Rodnianski 


The heat equation was hrst proposed by fourier 
[VI.25] as a mathematical description of the trans- 
fer of heat in solid bodies. Its influence has subse- 
quently been felt in many corners of mathematics: 
it provides explanations for such disparate phenom- 
ena as the formation of ice (the Stefan problem), the 
theory of incompressible viscous fluids (the navier- 
stokes equation [III.23]), geometric flows (e.g., curve 
shortening, and the harmonic-map heat flow prob- 
lem), brownian motion [IV.25], liquid filtration in 
porous media (the Hele-Shaw problem), index theorems 
(e.g., the Gauss-Bonnet-Chern formula), the price of 
stock options (the black-scholes formula [VII.9 §2]), 
and the topology of three-dimensional manifolds (the 
poincaré conjecture [V.28]). But the bright future of 
the heat equation could have been predicted at its birth: 
after all, another small event that accompanied it was 
the creation of fourier analysis [III.27]. 

The propagation of heat is based on a simple conti- 
nuity principle. The change in the quantity of heat u in 
a small volume AV over a small interval of time At is 
approximately 

CD ^ At AV, 

where C is the heat capacity of the substance and 
D is its density; but it is also given by the amount 
of heat entering and exiting through AV, which is 
approximately 

where K is the heat conductivity constant and n is the 
unit normal to the boundary of AV. 

Thus, setting the values of all physical constants to 1, 
dividing through by At and AV, and letting them tend 
to zero, we find that the evolution of the amount of heat 
(that is, the temperature) in a three-dimensional solid 
Q is governed by the following classical heat equation, 
where u{t, x) is the temperature at time t at the point 
x = (x,y,z): 


dt 


u(t,x) - Au(t,x) = 0. 


(1) 


' dy? 
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is the three-dimensional Laplacian; Au is the limit as 
the diameter of AV tends to zero of the quantity 
1 f du 
AV J S av dn' 

To determine u(t, x), equation (1) needs to be com- 
plemented by the initial distribution Uo(x) = u(0,x) 
and boundary conditions on the solid interface dQ. For 
example, for a solid unit cube C with surface main- 
tained at zero temperature, the heat equation is consid- 
ered as a problem with Dirichlet boundary conditions 
and, as was proposed by Fourier, u(t,x) canbe found 
by the method of separation of variables by expanding 
Uo(x) into its Fourier series 

Uo(x,y,z) = £ Ckmi sin(Trkx) 

k,m,i=o x sin(Trmy) sin(nTz), 

which leads to the solution 

u(t,x,y,z) = Y. e -7T (k +m +l )l Ckmi sin(Trfex) 

k,m,i=o x sminmy) sin(nTz). 

This simple example already illuminates a fundamen- 
tal property of the heat equation: the tendency of its 
solutions to converge to an equilibrium State. In this 
case it reflects a physically intuitive faet that the tem- 
perature u(t,x) converges to the constant distribution 
u* (xT= Cqoq . 

Propagation of heat in an insulated body corresponds 
to the choice of the Neumann boundary conditions, in 
which the normal derivative of u (normal, that is, to 
the boundary dQ) is set to vanish. Its solutions can be 
constructed in a similar fashion. 

The reason that Fourier analysis is intimately con- 
nected with the heat equation is that the trigonometric 
funetions are eigenfunctions [1.3 §4.3] of the Lapla- 
cian. A variety of more general heat equations can be 
obtained if one replaces the Laplacian by a more general 
linear, self-adjoint [III. 5 2 §3.2], nonnegative hamil- 
tonian [III.35] H with a discrete set of eigenvalues 
A n and corresponding eigenfunctions tp n . That is, one 
considers the heat flow 

^m + Hw = 0. 

The solution u{t) is given by the formula u(t) = 
e~ tH uo, where e~ tH is the heat semigroup generated 
by H, which also takes the more explicit form 

M(t,x) = X e~^ nt C„ip n (x) . 


Here the coefficients C n are the Fourier coefficients of 
uo relative to H: that is, they are the coefficients that 
arise when we write Uo as a sum Xn=o CnTn- (The 
existence of such a decomposition follows from the 
spectral theorem [III. 5 2 §3.4] for self-adjoint opera- ™j S n ° e t ^ rences 
tors. In a similar way, heat flows can also be gener- tor the spectrai 
ated by self-adjoint operators with a continuous spec- to be checked at 
trum.) In particular, the asymptotic behavior of u (t, x) process as there 
as t — +oo is completely determined by the Spectrum piaces that they 


Although explicit, representations like this do not 
provide very good quantitative descriptions of the 
behavior of the heat equation. To obtain such descrip- 
tions one has to abandon the idea of constructing solu- 
tions explicitly and look instead for principles and 
methods that apply to general classes of solutions 
while also being sufficiently robust to be useful in the 
analysis of more complicated heat equations. 

The first methods of this type are called energy iden- 
tities. To derive an energy identity, one multiplies the 
heat equation by a certain quantity, which may depend 
on the given solution, and integrates by parts. The sim- 
plest two identities of this type are the conservation of 
total heat of an insulated body, 


37 f u(t,x) dx = O, 
dt Ja 

and the energy identity, 


f u 2 (t,x) dx + 2 f f |Vu(5,x)| 2 dxd5 

Ja Jo Jo 

= u 2 (0,x) dx. 

Ja 

The second identity already captures a fundamental 
smoothing property of the heat equation: since all three 
integrands are nonnegative and the first and third inte- 
grals are finite, the average of the mean-square gradient 
of u is finite, even if the initial mean-square gradient is 
infinite, and it even decreases to zero with t. In faet, 
away from the boundary of O an arbitrary amount of 
smoothing takes place, and not just on average but at 
every time t > 0. 

The second fundamental principle of the heat equa- 
tion is the global maximum principle 


max u(t,x) 

xeD.O^t^T 

^max(u(0,x), ^max^^uft.x)), 

which tells us the familiar faet that the hottest spot in 
the body, over all time, is either on its boundary or in 
the initial distribution. 



220 


III. Mathematical Concepts 


Finally, the diffusive properties of the heat equa- 
tion in R” are captured by the Harnack inequality for 
nonnegative solutions u. It tells us that 

U(t2,X 2 ) > ( ti \ n/2 -1*2 -*02/402 -ti) 
u(ti,xi) ' \t 2 ) 

when t 2 > ti. This tells us that if the temperature at x\ 
at time ti takes a certain value, then the temperature 
at x 2 at time t 2 cannot be too much smaller. 

This form of the Harnack inequality features a very 
important object in the study of the heat equation, 
called the heat kerne!. 


P(t,x,y) 


i e -l*-yl 2 /4t 

(4nt) n/2 


One of its many uses is that it allows one to construct 
solutions of the heat equation in the whole of space 
(that is, in R n ) from initial data Uq, by the formula 


u(t,x ) = | ^ p(t,x,y)u 0 (y) dy. 


It also shows that after a time t initial point dis- 
turbances become distributed in a hall of radius %/f 
around the point of the original disturbance. This sort 
of relation between spatial scales and timescales is the 
characteristic parabolic scaling of the heat equation. 

As was shown by Einstein, the heat equation is inti- 
mately connected with the diffusion process of Brown- 
ian motion. In faet, the mathematical description of 
Brownian motion is in terms of a random process 
B t with transitional probability densities given by the 
heat kernel p(t, x, y). For the n-dimensional Brownian 
motion Bf starting at x, the funetion 


u(t,x) = W.[Uo(V2Bf)] 


computed with the help of expectation value E is pre- 
cisely the solution of the heat equation in R™ with initial 
data uo(x). This connection is the start of a mutually 
beneficial relationship between the theory of the heat 
equation and probability. Among the most profitable 
applications of this relationship is the Feynman-Kac 
formula 

u(t,x) = E^exp ( - V(V2B?)ds)u 0 (V2Bf )], 

which connects Brownian motion with solutions of the 
heat equation 

^u(t.x) - A u(t,x) + V(x)u(t,x) = 0 
with initial data uo (x). 

The three fundamental principles of the heat equa- 
tion described above are remarkably robust, in the 
sense that they, or weaker versions of them, hold even 


for very general variants of the classical equation. For 
instance, they can be applied to the question of the 
continuity of solutions of the heat equation 



where all that is assumed of the coefficients ay is that 
they are bounded and that they satisfy the ellipticity 
condition A | § [ 2 < ^ A|§[ 2 . One can even look 

at the equations in “nondivergence form”: 


d ^ / \ v d 

Tt u ~ Z a v {x) ^^ u = 0 - 


Here, the connection between the heat equation and the 
corresponding stochastic diffusion process turns out to 
be particularly helpful. This analysis has led to beauti- 
ful appheations in the calculus of variations [III.96] 
and in fully nonlinear problems. 

The same principles also hold for the heat equations 
on riemannian manifolds [1.3 §6.10]. The appropri- 
ate analogue of the Laplacian for a manifold M is the 
Laplace-Beltrami operator Am, and the heat equation 
for M is 


l t u-A M u = 0. 

If the Riemannian metric is g, then in local coordinates 
Am takes the form 


In this case, a version of the Harnack inequality holds 
for the heat equation on a manifold that has ricci cur- 
vature [III.80] bounded from below. Interest in the 
heat equations on manifolds is in part motivated by 
nonlinear geometric flows and attempts to understand 
their long-term behavior. One of the earliest geometric 
flows was the harmonic map flow 



which describes a deformation of the map <P(t, ■) 
between two compact Riemannian manifolds M and N. 
The operator A^ is a nonlinear Laplacian that is con- 
structed by projecting A M onto the tangent space of N. 
This is a gradient flow associated with the energy 

E[U] = |dU&; 

it measures the stretching of the map U between M and 
N. Under the assumption that the sectional curvature 
of N is nonpositive, it can be shown that the harmonic 
map heat flow is regular and converges, as t — +<», to 
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a harmonic map between M and N, which is a critical 
point of the energy functional E[U]. This heat equation 
is used to establish the existence of harmonic maps 
and to construct a continuous deformation of a given 
map <P(0, ■) to a harmonic map 4>(+oo, ■ ). The curva- 
ture assumption on the target manifold N is responsi- 
ble for the crucial monotonicity properties of the har- 
monic map heat flow, which come to light through the 
use of the energy estimates. 

An even more spectacular application of a defor- 
mation principle of this kind appears in the three- 
dimensional ricci flow [III.80] 

= -2Ric ij(g), 

which is a quasilinear heat evolution of a family of 
metrics ^y(t) on a given manifold M. In this case the 
flow is not necessarily regular; nonetheless, it can be 
extended as a flow with “surgeries” in such a way that 
the structure of the surgeries and the long-term behav- 
ior of the flow can be precisely analyzed. This analy- 
sis shows in particular that any three-dimensional sim- 
ply connected manifold is diffeomorphic to a three- 
dimensional sphere, which gives the proof of the 
Poincaré conjecture. 

The long-term behavior of the heat equation is also 
important in the analysis of reaction-diffusion sys- 
tems and associated biological phenomena. This was 
suggested already in the work of turing [VI.94] in 
his attempt to understand morphogenesis (the for- 
mation of inhomogeneous patterns such as animal- 
coat patterns from a nearly homogeneous initial State) 
by means of exponential instabilities in the reaction- 
diffusion equations 

^_u = pAu +f(u,v), ^_v = vAv +g(u,v). 

These examples emphasize the long-term behavior of 
the heat equation, and in particular the tendency of its 
solutions to converge to an equilibrium, or alternatively 
to develop exponential instabilities. However, it turns 
out that the short-term behavior of the heat equation 
on a manifold M is of the utmost importance in connec- 
tion with the geometry and topology of M. This connec- 
tion is twofold: first, one seeks to establish a relation- 
ship between the Spectrum of Am and the geometry of 
M; second, one can use an analysis of the short-term 
behavior to prove index theorems. The former aspect, 
in the context of planar domains, is captured by Marc 
Kac's well-known question, “Can one hear the shape of 


a drum?” For manifolds it begins with the Weyl formula 

| 0 e " tA, = (4»/r (Vol(M)+oa)) 

as t tends to 0. The left-hand side of the identity is the 
trace of the heat kernel of Am- That is, 

X e -tAi = tre _tA " = f p(t,x,x) dx, 
i=o 

where p(t,x,y) is such that any solution of the heat 
equation du/dt - AmU = 0 with u( 0,x) = Uo(x) is 
given by the expression 

u(t,x) = p(t,x,y)uo(y) dy. 

The right-hand side of the Weyl identity reflects the 
short-term asymptotics of the heat kernel p(t, x, y). 

The heat-flow approach to the proof of the index the- 
orems can be viewed as a refinement of both sides of 
the Weyl identity. The trace on the left-hand side is 
replaced by a more complicated “super-trace,” while 
the right-hand side involves full asymptotics of the 
heat kernel, which requires one to understand subtle 
cancelations. The simplest example of this kind is the 
Gauss-Bonnet formula 

X (M) = 2tt J R, 

which connects the Euler characteristic of a two-dimen- 
sional manifold M and the integral of its scalar curva- 
ture. The Euler characteristic x(Af) arises from a linear 
combination of traces of the heat flows associated with 
the Hodge Laplacian (d + d*) 2 restricted to the space 
of exterior differential O-forms, 1-forms, and 2-forms. 
A proof of a general atiyah-singer index theorem 
[V.2] involves heat flows associated with an operator 
given by the square of a Dirac operator. 


III.37 Hilbert Spaces 


The theory of vector spaces [1.3 §2.3] and linear 
maps [1.3 §4.2] underpins a large part of mathematics. 
However, angles cannot be defined using vector space 
concepts alone, since linear maps do not in general pre- 
serve angles. An inner product space can be thought of 
as a vector space with just enough extra structure for 
the notion of angle to make sense. 

The simplest example of an inner product on a vector 
space is the standard scalar product defined on R n , the 
space of all real sequences of length n, as follows. If 
v = (vi,...,v n ) and w = (wi,...,io n ) are two such 
sequences, then their scalar product, denoted (v,w), 
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is the sum viwi + V2W2 + ■ ■ ■ + v n w n - (For example, 
the scalar product of (3,2, -1) and (1,4,4) is 3 x 1 + 
2 X4 + (—1) x4 = 7.) 

Among the properties that the scalar product has are 
the following two. 

(i) It is linear in each variable separately. That is, 
(Am + pv,w) = Å{u,w) + p(v,w) for any three 
vectors u, v, and w and any two scalars A and p, 
and similarly (u,Av + pw) = Å(u,v) + p(u,w). 

(ii) The scalar product (v,v) of any vector v with 
itself is always a nonnegative real number, and is 
zero only if v is zero. 

In a general vector space, any function ( v , w ) of pairs of 
vectors v and w that has these two properties is called 
an inner product, and a vector space with an inner prod- 
uct is called an inner product space. If the vector space 
has complex scalars, then instead of (i) one must use 
the following modification. 

(i') For any three vectors u, v, and w and any two 
scalars A and p, < Au+pv,w ) = \{u,w) + p{v,w), 
and (m, Av + pw) = A(u, v) + p{u,w). That is, 
the inner product is conjugate-linear in the second 
variable. 

The reason this has anything to do with angles is that 
in R 2 and R 3 the scalar product of two vectors v and 
w works out as the length of v times the length of w 
times the cosine of the angle between them. In particu- 
lar, since a vector v makes an angle of zero with itself, 
(v,v) is the square of the length of v. 

This gives us a natural way to define length and angle 
in an inner product space. The length, or norm, of a 
vector v, denoted ||i’||, is Given two vectors v 

and w, the angle between them is defined by the faet 
that it lies between 0 and tt (or 180°) and its cosine is 
(v, w) / 1| v || || w || . Once length has been defined, we can 
also talk about distance: the distance d(v,w ) between 
v and w is the length of their difference, or \\v - w\\. 
This definition of distance satisfies the axioms for a 
metric space [III. 5 8]. From the notion of angle, we can 
say what it is for v and w to be orthogonal to each 
other: this simply means that (v,w) = 0. 

The usefulness of inner product spaces goes far 
beyond their ability to represent the geometry of two- 
and three-dimensional space. Where they really come 
into their own is if they are infinite dimensional. Then it 
becomes convenient if they satisfy the additional prop- 
erty of completeness, which is briefly discussed at the 


end of [III.64]. A complete inner product space is called 
a Hilbert space. 

Two important examples of Hilbert spaces are the 
following. 

(i) ^2 is the natural infmite-dimensional generaliza- 
tion of R n with the standard scalar product. It is 
the set of all infinite sequences (ai,a2,as, ...) 
such that the infinite sum |ai| 2 + tG I 2 + 

I »3 1 2 4- ■ ■ ■ converges. The inner product of 
(ai,a2,«3, ...) and {bi, bi, b -$, . . . ) is a\b\ + 
a2bi+a-ib-i + - ■ ■ (which can be shown to converge 
by the cauchy-schwarz inequality [V.22].) 

(ii) Z.2 [0, 2tt ] is the set of all funetions / defined on 
the interval [0, 2 t r] of all real numbers between 0 
and 2 tt, such that the integral \f(x)\ 2 dx 
makes sense and is finite. The inner product 
of two funetions / and g is defined to be 
Jo 77 f(x)g{x ) dx. (For technical reasons, this defi- 
nition is not quite accurate, as a nonzero function 
can have norm zero, but this problem can easily 
be dealt with.) 

The second of these examples is central to Fourier 
analysis. A trigonometric function is a function of the 
form cos(mx) or sin(nx). The inner product of any 
two different trigonometric funetions is zero, so they 
are all orthogonal. Even more importantly, the trigono- 
metric funetions serve as a coordinate system for the 
space Z-2 [0, 2tt], in that every function / in the space 
can be represented as an (infinite) linear combination 
of trigonometric funetions. This allows Hilbert spaces 
to model sound waves: if the function / represents a 
sound wave, then the trigonometric funetions are the 
pure tones that are its constituent parts. 

These properties of trigonometric funetions illus- 
trate a very important general phenomenon in the 
theory of Hilbert spaces: that every Hilbert space has 
an orthonormal basis. This means a set of vectors e; 
with the following three properties: 

• || ei || = 1 for every i; 

• (ej, ej) = 0 whenever i j\ and 

• every vector v in the space can be expressed as a 
convergent sum of the form Xi A ;e;. 

The trigonometric funetions do not quite form an 
orthonormal basis of l2[0 , 2tt] but suitable multiples 
of them do. There are many contexts besides Fourier 
analysis where one can obtain useful information about 
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a vector by decomposing it in terms of a given orthonor- 
mal basis, and many general facts that can be deduced 
from the existence of such bases. 

Hilbert spaces (with complex scalars) are also cen- 
tral to quantum mechanics. The vectors of a Hilbert 
space can be used to represent possible States of a 
quantum mechanical system, and observable features 
of that system correspond to certain linear maps. 

For this and other reasons, the study of linear oper- 
ators [III.52] on Hilbert spaces is a major branch of 
mathematics: see operator algebras [IV. 19] 


III. 3 8 Holomorphic Functions 


A function / defined on some region D of the complex 
plane is called holomorphic if it is differentiable. This 
has the meaning one would expect: for every z in D the 
quantity (f(z + w) - f{z))/w should tend to a limit 
as w tends to 0. This limit is denoted by f (z) , and the 
function f is called the derivative of /. 

However, this bare definition hides the faet that com- 
plex differentiability is very different from real differ- 
entiability, roughly speaking because the linear approx- 
imations it gives are all of a special kind, namely 
“mul tiply by the complex number A.” This has the effeet 
of making complex differentiability a far stronger prop- 
erty than the differentiability of functions defined on 
1 or I 2 . For example, if / is holomorphic, then /' is 
automatically holomorphic as well: the analogue of this 
statement for real functions is very definitely false. 

Holomorphic functions are discussed in more detail 

in SOME FUNDAMENTAL MATHEMATICAL DEFINITIONS 

[1.3 §5.6]. 


III.39 Homology and Cohomology 


If X is a topological space [III.92], then one can asso- 
ciate with it a sequence of groups H n (X,R), where R is 
a commutative ring [III.83 §1] such as z or C. These 
groups, the homology groups of X (with coefficients 
in R), are a powerful invariant: powerful because they 
contain a great deal of information about X but are 
nevertheless easy to compute, at least compared with 
some other invariants. The closely related cohomology 
groups H n (X,R) are more useful still because they can 
be made into a ring: to oversimplify slightly, an ele- 
ment of the cohomology group H n (X) is an equiva- 
lence class [1.2 §2.3] [Y] of a subspace Y of codimen- 
sion n. (Of course, for this to make true sense X should 
be a fairly nice space such as a manifold [1.3 §6.9].) 


Then, if [Y] and [Z] belong to H n (X, R ) and H m (X,R), 
respectively, their product is [Y n Z]. Since Y n Z “typ- 
ically” has codimension n + m, the equivalence class 
[ Y n Z] belongs to H n+m (X, R). Homology and cohom- 
ology groups are described in more detail in algebraic 
TOPOLOGY [IV. 10]. 

The concepts of homology and cohomology have 
become far more general than the above discussion 
suggests, and are no longer tied to topological spaces: 
for instance, the notion of group cohomology is of great 
importance in algebra. Even within topology, there are 
many different homology and cohomology theories. In 
1945, Eilenberg and Steenrod devised a small number 
of axioms that greatly clarified the area: a homology 
theory is any association of groups with topological 
spaces that satisfies these axioms, and the fundamen- 
tal properties of homology theories follow from the 
axioms. 


III.40 Homotopy Groups 


If X is a topological space [III.92], then a loop in X is 
a path that begins and ends at the same point; or, more 
formally, a continuous function / : [0, 1] — X such that 
/(O) = /( 1). The point where the path begins and ends 
is called the base point. If two loops have the same base 
point, they are called homotopic if one can be continu- 
ously deformed to the other, with all the intermediate 
paths living in X and beginning and ending at the given 
base point. For example, if X is the plane R 2 , then any 
two paths that begin and end at (0,0) are homotopic, 
whereas if X is the plane with the origin removed, then 
whether or not two paths (that begin and end at some 
other point) are homotopic depends on whether or not 
they go around the origin the same number of times. 

Homotopy is an equivalence relation [1.2 §2.3], 
and the equivalence classes of paths with base point x 
form the fundamental group of X, relative to x, which 
is denoted by ny (X,x).lfX is connected, then this does 
not depend on x and we can write ny (X) instead. The 
group operation is “concatenation”: given two paths 
that begin and end at x, their “product” is the com- 
bined path that goes along one and then the other, and 
the product of equivalence classes is then defined to 
be the equivalence class of the product. This group is a 
very important invariant (see for instance geometric 
AND COMBINATORIAL GROUP THEORY [IV.l 1 §7]); it is 
the first in a sequence of higher-dimensional homotopy 
groups, which are described in algebraic topology 
[TV.10 §§2, 3]. 
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III.41 The Hyperbolic Plane 


The parallel postulate of euclid [VI.2] States that for 
any straight line L in the plane and any point x not on L 
there is exactly one straight line M that passes through 
x and does not meet L. For over 2000 years a central 
problem in mathematics was to decide whether this 
statement could be deduced from the other axioms of 
Euclidean geometry. Eventually, gauss [VI. 26], bolyai 
[VI. 34], and lobachevskii [VI.31] developed hyperbolic 
geometry, in which all the other axioms hold, but the 
parallel postulate is false because there can be more 
than one line through x that does not meet L. The 
history of this discovery is explained in geometry 
[II.2], 

The hyperbolic plane can be defined in several ways. 
Two of the most popular are called the half-plane 
model and the disk model, which are riemannian met- 
rics [1.3 §6.10] defined on the upper half-plane and 
the unit disc, respectively. Almost all the familiar con- 
cepts of Euclidean geometry can be defined for hyper- 
bolic geometry, but their properties are different. For 
example, the angles of a hyperbolic triangle always 
add up to less than tt. More details about the hyper- 
bolic plane and how it is constructed can be found 

in SOME FUNDAMENTAL MATHEMATICAL DEFINITIONS 

[1.3 §§6.6, 6.10], 


III.42 The Ideal Class Group 


THE FUNDAMENTAL THEOREM OF ARITHMETIC [V.16] 

asserts that every positive integer can be written in 
exactly one way (apart from reordering) as a product of 
primes. Analogous theorems are true in other contexts 
as well: for example, there is a unique factorization the- 
orem for polynomials, and another one for Gaussian 
integers, that is, numbers of the form a + ib where a 
and b are integers. 

However, for most number fields [III.65], the asso- 
ciated “ring of integers” does not have the unique- 
factorization property. For example, in the ring 
[III.83 § 1] of numbers of the form a + bV--S with a 
and b integers, one can factorize 6 either as 2 x 3 or 

as (1 + V=5)(l- 

The ideal class group is a way of measuring how badly 
unique factorization fails. Given any ring of integers of 
a number held, one can define a multiplicative structure 
on its set of ideals [III.83 §2], for which unique fac- 
torization holds. The elements of the ring itself corre- 


spond to so-called “principal ideals,” so if every ideal is 
principal, then unique factorization holds for the ring. 
If there are nonprincipal ideals, then one can define 
a natural equivalence relation [1.2 §2.3] on them 
in such a way that the equivalence classes, which are 
called ideal classes, form a group [1.3 §2.1]. This group 
is the ideal class group. All principal ideals belong to 
the class that forms the identity of this group, so the 
larger and more complex the ideal group is, the further 
the ring is from having the unique-factorization prop- 
erty. For more details, see algebraic numbers [IV.3], 
and in particular section 7. 


III.43 Irrational and Transcendental 
Numbers 

Ben Green 


An irrational number is one that cannot be written as 
a/b with both a and b integers. A great many naturally 
occurring numbers, such as -J2, e, and tt, are irrational. 
The following proof that -J2 is irrational is one of the 
best-known arguments in all of mathematics. Suppose 
that -J2 = a/b; since common factors can be canceled, 
we may assume that a and b have no common factor; 
we have a 2 = 2b 2 , which means that a must be even; 
write a = 2c; but then 4c 2 = 2b 2 , which implies that 
2c 2 = b 2 , and hence b must be even too; this, how- 
ever, is contrary to our assumption that a and b were 
coprime. 

Several famous conjectures in mathematics ask 
whether certain specific numbers are rational or not. 
For example, tt + e and n e are not known to be 
irrational, and neither is Euler’s constant: 

y = Um (l + ^ + ■ ■ ■ + - logn <f 0.577215 .... 

It is known that £(3) = 1 + 2 -3 + 3 -3 + ■ ■ ■ is irrational. 
Almost certainly, £(5), £(7), £( 9), ... are all irrational 
as well. However, although it has been shown that 
infmitely many of these numbers are irrational, no 
specific one is known to be. 

A classic proof is that of the irrationality of e. If 



were equal to p/q, then we would have 

p<«- f>r=i 
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The left-hand side and the terms of the sum with j < q 
are all integers. Therefore the quantity 

y ‘f- _ 1 j 1 ( _ 

jil J [ « +1 (« + !)(« + 2) 

is also an integer. But it is not hard to show that this 
quantity lies strictly between 0 and 1, a contradiction. 

The principle used here, that a nonzero integer must 
have absolute value at least one, is surprisingly pow- 
erful in the theory of irrational and transcendental 
numbers. 

Some numbers are more irrational than others. In a 
sense, the most irrational number is t = J, U + V5), the 
golden ratio, because the hest rational approximations 
to it, which are ratios of consecutive Fibonacci num- 
bers, approach it rather slowly. There is also a very 
elegant proof that t is irrational. This is based on the 
observation that the t x 1 rectangle R may be divided 
into a square of side 1 and a 1/t x 1 rectangle. If t 
were rational, then we would be able to create a rect- 
angle with integer sides that was similar to R. From 
this we could remove a square, and we would be left 
with a smaller rectangle with integer sides that would 
still be similar to R. We could continue this process ad 
infinitum, which is clearly impossible. 

A transcendental number is one which is not alge- 
braic, that is to say, is not the root of a polynomial 
equation with integer coefficients. Thus y/2 is not tran- 
scendental, since it solves x 2 - 2 = 0, and neither is 

Vif-, 

Are there, in faet, any transcendental numbers? This 
question was answered by liouville [VI.39] in 1844, 
who showed that various numbers were transcenden- 
tal, of which 

*= z«**-* 

= 0.1100010000000000000000010 . . . 
is a well-known example. This is not algebraic, because 
it can be approximated more accurately by ratio- 
nals than any algebraic number can. For example, the 
rational approximation 1 10 001 / 1 000 000 is very close 
indeed to k, but its denominator is not particularly 
large. 

Liouville showed that if « is a root of a polynomial 
of degree n, then 



for all integers a and q and for some constant C de- 
pending on a. In words, a cannot be too well approxi- 
mated by rationals. Roth later proved that the exponent 


n here can actually be replaced by 2 + s for any e > 0. 
(For more on these topics, see liouville’s theorem 
AND ROTH’S THEOREM [V.25].) 

A completely different approach to the existence of 
transcendental numbers was discovered by cantor 
[VI. 54] thirty years later. He proved that the set of 
algebraic numbers is countable [III.ll], which means, 
roughly speaking, that they may be Usted in order. More 
precisely, there is a surjective map from N, the set of 
natural numbers, to the set of algebraic numbers. 

By contrast, the real numbers R are not countable. 
Cantor’s famous proof of this uses a diagonalization 
argument to show that any listing of all the real num- 
bers must be incomplete. 

There must, therefore, be real numbers that are not 
algebraic. 

It is generaUy rather difficult to prove that a spe- 
cific number is transcendental. For instance, it is by 
no means the case that aU transcendental numbers are 
very weU approximated by rationals; this merely pro- 
vides a useful sufficient condition. There are other ways 
to establish that numbers are transcendental. Both e 
and tt are known to be transcendental, and it is known 
that |e - a/b \ > C(s)/b 2+£ for all s > 0, so e is not 
aU that weU approximated by rationals. Since t, (2 m) is 
always a rational multiple of n 2m , it foUows that the 
numbers £(2), £(4), . . . are all transcendental. 

The modern theory of transcendental numbers con- 
tains a wealth of beautiful results. An early one is 
the Gel’fond -Schneider theorem, which says that 
is transcendental if « * 0, 1 is algebraic, and if /( is 
algebraic but not rational. In particular, -J2' 22 is tran- 
scendental. There is also the six-exponentials theorem, 
which States that if xi, X2 are two linearly independent 
complex numbers, and if yi, y2, V3 are three linearly 
independent complex numbers, then at least one of the 
six numbers 

gXiyi, e xiy 2 ' e xiy 3j e *2yi' e X 2 y 2 , e x 2 ,y 3 

is transcendental. Related to this is the (as yet unsolved) 
four-exponentials conjecture: if X\ and %2 are two Un- 
early independent complex numbers, and if y\ and yi 
are linearly independent, then at least one of the four 
exponentials 

gXiTi, e xiy 2 ' e x 2 yi J e x 2 y 2 

is transcendental. 
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The Ising model is one of the fundamental models of 
statistical physics. It was originally designed as a model 
for the behavior of a ferromagnetic material when it is 
heated up, but it has since been used to model many 
other phenomena. 

The following is a special case of the model. Let G n 
be the set of all pairs of integers with absolute value 
at most n. A configuration is a way of assigning to 
each point x in G n a number a x , which equals 1 or 
-1. The points represent atoms and cr(x) represents 
whether x has “spin up” or “spin down.” With each 
configuration a we associate an “energy” E(a), which 
equals - X o'xO'y, where the sum is taken over all pairs 
of neighboring points x and y. Thus, the energy is high 
if many points have different signs from some of their 
neighbors, and low if G n is divided into large clusters 
of points with the same sign. 

Each configuration is assigned a probability, which 
is proportional to e~ E(cr)/T . Here, T is a positive real 
number that represents temperature. The probability 
of a given configuration is therefore higher when it has 
smali energy, so there is a tendency for a typicai config- 
uration to have ciusters of points with the same sign. 
However, as the temperature T increases, this cius- 
tering effect becomes smaiier since the probabilities 
become more equai. 

The two-dimensional Ising model with zero potential 
is the iimit of this modei as n tends to infmity. For a 
more detailed discussion of the general model and of 
the phase transition associated with it, see probabilis- 
TIC MODELS OF CRITICAL PHENOMENA [IV.26 §5]. 


III.4 5 J ordan N ormal F orm 


Suppose that you are presented with an nxn real or 
complex matrix [1.3 §4.2] A and would like to under- 
stand it. You might ask how it behaves as a linear map 
[1.3 §4.2] on R n or C", or you might wish to know what 
the powers of A are. In general, answering these ques- 
tions is not particularly easy, but for some matrices it 
is very easy. For example, if A is a diagonal matrix (that 
is, one whose nonzero entries all lie on the diagonal), 
then both questions can be answered immediately: if 
x is a vector in R” or C n , then Ax will be the vector 
obtained by multiplying each entry of x by the corre- 
sponding diagonal element of A, and to compute A m 
you just raise each diagonal entry to the power m. 

So, given a linear map T (from R n to R n or from C™ 
to C n ), it is very nice if we can find a basis with respect 
to which T has a diagonal matrix; if this can be done, 


then we feel that we “understand” the linear map. Say- 
ing that such a basis exists is the same as saying there 
is a basis consisting of eigenvectors [1.3 §4.3]: a linear 
map is called diagonalizable if it has such a basis. Of 
course, we may apply the same terminology to a matrix 
(since a matrix A determines a linear map on R n or C n , 
by mapping x to Ax). So a matrix is also called diagonal- 
izable if it has a basis of eigenvectors, or equivalently 
if there is an invertible matrix P such that P~ l AP is 
diagonal. 

Is every matrix diagonalizable? Over the reals, the 
answer is no for uninteresting reasons, since there need 
not even be any eigenvectors: for example, a rotation in 
the plane clearly has no eigenvectors. So let us restrict 
our attention to matrices and linear maps over the 
complex numbers. 

If we have a matrix A, then its characteristic polyno- 
mial, namely det(A - ti), certainly has a root, by the 

FUNDAMENTAL THEOREM OF ALGEBRA [V.15]. If A is SUCh 

a root, then standard facts from linear algebra tell us 
that A - M is singular, and therefore that there is a 
vector x such that ( A - AJ)x = 0, or equivalently that 
Ax = Ax. So we do have at least one eigenvector. Unfor- 
tunately, however, there need not be enough eigenvec- 
tors to form a basis. For example, consider the linear 
map T that sends (1,0) to (0, 1) and (0, 1) to (0, 0). The 
matrix of this map (with respect to the obvious basis) 
is ( i § )■ This matrix is not diagonalizable. One way of 
seeing why not is the following. The characteristic poly- 
nomial turns out to be t 2 , of which the only root is 0. 
An easy computation reveals that if Ax = 0 then x has 
to be a multiple of (0, 1), so we cannot find two lin- 
early independent eigenvectors. A rather more elegant 
method of proof is to observe that T 2 is the zero matrix 
(since it maps each of (1, 0) and (0, 1) to (0, 0)), so that 
if T were diagonalizable, then its diagonal matrix would 
have to be zero (since any nonzero diagonal matrix has 
a nonzero square), and therefore T would have to be 
the zero matrix, which it is not. 

The same argument shows that any matrix A such 
that A k = 0 for some k (such matrices are called 
nilpotent) must fail to be diagonalizable, unless A is 
itself the zero matrix. This applies, for example, to any 
matrix that has all of its nonzero entries below the main 
diagonal. 

What, then, can we say about our nondiagonalizable 
matrix T above? In a sense, one feels that (1,0) is 
“nearly” an eigenvector, since we do have T 2 (1,0) = 
(0, 0). So what happens if we extend our point of view 
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by allowing such vectors? One would say that a vec- 
tor x is a generalized eigenvector of T, with eigen- 
value A, if some power of T - A maps x to zero. For 
instance, in our example above the vector (1,0) is a 
generalized eigenvector with eigenvalue 0. And, just as 
we have an “eigenspace” associated with each eigen- 
value A (defined to be the space of all eigenvectors with 
eigenvalue A), we also have a “generalized eigenspace,” 
which consists of all generalized eigenvectors with 
eigenvalue A. 

Diagonalizing a matrix corresponds exactly to de- 
composing the vector space (C™) into eigenspaces. So 
it is natural to hope that one could decompose the vec- 
tor space into generalized eigenspaces for any matrix. 
And this turns out to be true. The way of breaking up 
the space is called Jordan normal form, which we shall 
now describe in more detail. 

Let us pause for a moment and ask: what is the very 
simplest situation in which we get a generalized eigen- 
vector? It would surely be the obvious generalization 
of the above example to n dimensions. In other words, 
we have a linear map T that sends e\ to e-i, e-i to 63, and 
so on, until e n -i is sent to e n , with e n itself mapped to 
zero. This corresponds to the matrix 
/O 0 0 ■ ■ ■ 0 0\ 

1 0 0 ■ ■ ■ 0 0 

0 1 0 ■ ■ ■ 0 0. 

v 0 0 0 ■ ■ ■ 1 0, 

Although this matrix is not diagonalizable, its behavior 
is at least very easy to understand. 

The Jordan normal form of a matrix will be a diagonal 
sum of matrices that are easily understood in the way 
that this one is. Of course, we have to consider eigen- 
values other than zero: accordingly, we define a block 
to be any matrix of the form 

/A 0 0 ■ ■ ■ 0 0\ 

1 A 0 ■ ■ ■ 0 0 

0 1 A ■ ■ ■ 0 0. 

,0 0 0 ■ ■ ■ 1 A, 

Note that this matrix A, with AJ subtracted, is precisely 
the matrix above, so that (A - A/) n is indeed zero. Thus, 
a block represents a linear map that is indeed easy to 
understand, and all its vectors are generalized eigen- 
vectors with the same eigenvalue. The Jordan normal 
form theorem tells us that every matrix can be decom- 
posed into such blocks: that is, a matrix is in Jordan 


normal form if it is of the form 



Here, the Bj are blocks, which can have different sizes, 
and the Os represent submatrices of the matrix with 
sizes depending on the block sizes. Note that a block 
of size 1 simply consists of an eigenvector. 

Once a matrix A is put into Jordan normal form, we 
have broken up the space into subspaces on which it 
is easy to understand the action of A. For example, 
suppose that A is the matrix 

/4 0 0 0 0 0 0\ 

1 4 0 0 0 0 0 

0 1 4 0 0 0 0 

0 0 0 4 0 0 0, 

0 0 0 1 4 0 0 

0 0 0 0 0 2 0 

K 0 0 0 0 0 1 2 , 

which is made out of three blocks, of sizes 3, 2, and 
2. Then we can instantly read off a great deal of infor- 
mation about A. For instance, consider the eigenvalue 
4. Its algebraic multiplicity (its multiplicity as a root of 
the characteristic polynomial) is 5, since it is the sum 
of the sizes of all the blocks with eigenvalue 4, while its 
geometric multiplicity (the dimension of its eigenspace) 
is 2, since it is the number of such blocks (because 
in each block we only have one actual eigenvector). 
And even the minimum polynomial of the matrix (the 
smallest-degree polynomial P(t) such that P(A) = 0) is 
easy to write down. The minimum polynomial of each 
block can be written down instantly: if the block has 
size k and generalized eigenvalue A, then it is (t - A) k . 
The minimum polynomial of the whole matrix is then 
the “lowest common multiple” of the polynomials for 
the individual blocks. For the matrix above, we get 
( t - 4) 3 , (t - 4) 2 , and ( t - 2) 2 for the three blocks, 
so the minimum polynomial of the whole matrix is 
(t — 4) 3 (t — 2) 2 . 

There are some generalizations of Jordan normal 
form, away from the context of linear maps acting on 
vector spaces. For example, there is an analogue of the 
theorem that applies to Abelian groups, which turns 
out to be the statement that every finite Abehan group 
canbe decomposed as a direct product of cyclic groups. 
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III.46 Knot Polynomials 

W. B. R. Lickorish 


1 Knots and Links 


A knot is a curve in three-dimensional space that is 
closed (in other words, it stops where it began) and 
never meets itself along its way. A link is several such 
curves, all disjoint from one another, which are called 
the components of the link. Some simple examples of 
knots and links are the following: 



unknot trefoil figure eight 


OOQD oSo 

unlink Hopf link Whitehead link 


Two knots are equivalent or “the same” if one can 
be moved continuously, never breaking the "string,” to 
become the other. Isotopy is the technical term for such 
movement. For example, the following knots are the 
same: 



The first problem in knot theory is how to decide 
if two knots are the same. Two knots may appear to 
be very different but how does one prøve that they 
are different? In classical geometry two triangles are 
the same (or congruent) if one can be moved rigidly 
on to the other. Numbers that measure side-lengths 
and angles are assigned to each triangle to help deter- 
mine if this is the case. Similarly, mathematical entities 
called invariants can be associated with knots and links 
in such a way that if two links have different invari- 
ants, then they cannot be the same link. Many invari- 
ants relate to the geometry or topology of the com- 
plement of a link in three-dimensional space. The fun- 
damental group [IV. 10 §2] of this complement is an 
excellent invariant, but algebraic techniques are then 
needed to distinguish the groups. The polynomial of 
J. W. Alexander (published in 1926) is a link invari- 
ant derived from distinguishing such groups. Although 
rooted in algebraic topology [IV.10], the Alexander 


polynomial has long been known to satisfy a skein rela- 
tion (see below). The HOMFLY polynomial of 1984 gen- 
eralizes the Alexander polynomial and canbe based on 
the simple combinatorics of skein theory alone. 

1.1 The HOMFLY Polynomial 

Suppose that links are oriented so that directions, indi- 
cated by arrows, are given to all components. To each 
oriented link I is assigned its HOMFLY polynomial 
P(L), a polynomial with integer coefhcients in two vari- 
ables v and z (allowing both positive and negative 
powers of v and z). The polynomials are such that 

P(unknot) = 1 (1) 

and there is a linear skein relation 

v- l P(U)-vP[L.)=zP{l 0 ). ( 2 ) 

This means that whenever three links have identical 
diagrams except near one Crossing, where they are as 
follows 

L -X *•><■ 

then this equation holds. 

This turns out to be good notabon, although one 
could in principle use x and y in place of v -1 and -v. 
Although Alexander's polynomial satisfied a particular 
instance of (2), it took almost sixty years and the discov- 
ery of the Jones polynomial for it to be reahzed that this 
general linear relation can be used. Note that there are 
two possible types of Crossing in a diagram of an ori- 
ented link. A Crossing is positive if, when approaching 
the Crossing along the under-passing arc in the direc- 
tion of the arrow, the other directed arc is seen to cross 
over from left to right. If the over-passing arc crosses 
from right to left, the Crossing is negative. When inter- 
preting the skein relation at a Crossing of a link L, it is 
vital that L be regarded as L+ if the Crossing is positive 
and as i- if it is negabve. 

The theorem that underpins this theory, which is 
not at all obvious, is that it is possible to assign such 
polynomials to oriented links in a coherent fashion, 
uniquely, independent of any choice of a link’s diagram. 
A proof of this is given in Lickorish (1997). 

1.2 HOMFLY Calculations 

In a diagram of a knot it is always possible to change 
some of the crossings, from over to under, to achieve a 
diagram of the unknot. Links can be undone similarly. 
Using this, the polynomial of any link canbe calculated 
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from the above equations, though the length of the cal- 
culation is exponential in the number of crossings. The 
following is a calculation of P (trefoil). Firstly, consider 
the following instance of the skein relation: 

v _ 1 p( GO) - vP(Q£» = O 

Substituting the polynomial 1 for the polynomials of 
the two unknots, this shows that the HOMFLY polyno- 
mial of the two-component unlink is z _1 ( v~ l - v). A 
second usage of the skein relation is 



Substituting the previous answer for the unlink shows 
that the HOMFLY polynomial of the Hopf link is 
z -1 (v~ 3 - n _1 ) - zv~ ] . Finally, consider the following 
instance of the skein relation: 

~ vp (éb) - zF (&>y 

Substitution of the polynomial for the Hopf link already 
calculated and, of course, the value 1 for the unknot 
shows that 

P(trefoil) = - v~ 4 + 2v~ 2 + z 2 v~ 2 . 

A similar calculation shows that 

P(figure eight) = v 2 - 1 + v~ 2 - z 2 . 

The trefoil and the figure eight thus have different poly- 
nomials; this prøves they are different knots. Experi- 
mentally, if a trefoil is actually made from a necklace 
(using the clasp to join the ends together) it is indeed 
found to be impossible to move it to the conhguration 
of a figure eight knot. Note that the polynomial of a 
knot is not dependent on the choice of its orientation 
(but this is not so for links). 

Reflecting a knot in a mirror is equivalent to chang- 
ing every Crossing in a diagram of the knot from an 
over-crossing to anunder-crossing and vice versa (con- 
sider the plane of the diagram to be the mirror). The 
polynomial of the reflection is always the same as that 
of the original knot except that every occurrence of v 
must be replaced by one of -v~ x . Thus the trefoil and 
its reflection, 

& <Sb ■ 

have polynomials 

—v~ 4 + 2v~ 2 + z 2 v~ 2 and - v 4 + 2v 2 + z 2 v 2 . 

As these polynomials are not the same, the trefoil and 
its reflection are different knots. 


2 Other Polynomial Invariants 


The HOMFLY polynomial was inspired by the discov- 
ery in 1984 of the polynomial of V. F. R. Jones. For 
an oriented link L, the Jones polynomial V (I) has just 
one variable t (together with t -1 ). It is obtained from 
P(L) by substituting v = t and z = t 1/2 - t~ 1/2 , where 
t 1/2 is just a formal square root of t. The Alexan- 
der polynomial is obtained by the substitution v = 1, 
z = t~ 1/2 - t 1/2 . This latter polynomial is well under- 
stood in terms of topology, by way of the fundamental 
group, covering spaces, and homology theory, and can 
be calculated by various methods involving determi- 
nants. It was J. H. Conway who, in discussing in 1969 his 
normalized version of the Alexander polynomial (the 
polynomial in one variable z obtained by substituting 
v = 1 into the HOMFLY polynomial), first developed 
the theory of skein relations. 

There is one more polynomial (due to L. H. Kauff- 
man) based on a linear skein relation. The relation 
involves four links with unoriented diagrams differing 
as follows: 


\/ \/ 
/\ /\ 


X 


v/ 


There are examples of pairs of knots that the Kauffman 
polynomial but not the HOMFLY polynomial can distin- 
guish and vice versa; some pairs are not distinguished 
by any of these polynomials. 


2.1 Application to Alternating Knots 

For the Jones polynomial there is a particularly simple 
formulation, by means of “Kauffman’s bracket polyno- 
mial,” that leads to an easy proof that the Jones (but not 
the HOMFLY) polynomial is coherently defined. This 
approach has been used to give the first rigorous con- 
firmation of P. G. Tait’s (1898) highly believable pro- 
posal that a reduced alternating diagram of a knot has 
the minimal number of crossings for any diagram of 
that knot. Here “alternating” means that in going along 
the knot the crossings go: ... over, under, over, under, 

over Not every knot has such a diagram. “Reduced” 

means that there are, adjacent to each Crossing, four 
distinet regions of the diagram’s planar complement. 
Thus, for example, any nontrivial reduced alternating 
diagram is not a diagram of the unknot. Also, the fig- 
ure eight knot certainly has no diagram with only three 
crossings. 
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2.2 Physics 

Unlike that of Alexander, the HOMFLY polynomial has 
no known interpretation in terms of classical algebraic 
topology. It can, however, be reformulated as a col- 
lection of State sums, summing over certain labelings 
of a knot diagram. This recalls ideas from statistical 
mechanics; an elementary account is given in Kauffman 
(1991). An ampliflcation of the whole HOMFLY poly- 
nomial theory leads into a version of conformal held 
theory called topological quantum held theory. 

Further Reading 

Kauffman, L. H. 1991. Knots and Physics. Singapore: World 
Scientific. 

Lickorish, W. B. R. 1997. An Introduction to Knot Theory. 
Graduate Texts in Mathematics, volume 175. New York: 
Springer. 

Tait, P. G. 1898. On knots. In Scientific Papers, volume I, 
pp. 273-347. Cambridge: Cambridge University Press. 


III.47 K - Theory 


K-theory concerns one of the most important invari- 
ants of a topological space [III.92] X, a pair of groups 
called the K-groups of X. To form the group K°(X) one 
takes all (equivalence classes of) vector bundles on X, 
and uses the direct sum as the group operation. This 
leads not to a group but to a semigroup. However, from 
the semigroup one can easily construct a group in the 
same way that one constructs Z out of N: by taking 
equivalence classes of expressions of the form a- b. If i 
is a positive integer, then there is a natural way of defin- 
ing a group K~ l (X): it is closely related to the group 
K 0 (S l x X). The very important Bott periodicity theo- 
rem says that K l (X) depends only on the parity of i, so 
there are in faet just two distinet K-groups, K° (X) and 
K 1 (X). See algebraic topology [IV.10 §6] for more 
details. 

If X is a topological space such as a compact mani- 
fold, then one can associate with it the C * -algebra C (X) 
of all continuous funetions from X to C. It turns out to 
be possible to define the K-groups in terms of this alge- 
bra in such a way that it applies to algebras that are not 
of the form C{X). In particular, it applies to algebras 
where multiplication is not commutative. For instance, 
K-theory provides important invariants of C* -algebras. 
See OPERATOR ALGEBRAS [IV.19 §4.4]. 


Lagrange Multipliers 

See OPTIMIZATION AND LAGRANGE 
MULTIPLIERS [III.66] 


III.48 The Leech Lattice 


To define a lattice in R d one chooses d linearly inde- 
pendent vectors vi,...,Vd and takes all combinations 

of the form a\Vi + ■ ■ ■ + a^v^, where a\ aa are 

integers. For example, to define the hexagonal lattice 
in R 2 one can take vi and vz to be (1,0) and (|,i /f), 
respectively. Notice that Vz is vi rotated by tt/3, and 
also that vz - vi is vz rotated by tt/3. Continuing this 
process, one can generate all the points in a regular 
hexagon about the origin. 

The hexagonal lattice is unusual, among lattices in 
R 2 , in that it has a rotational symmetry of order 6. This 
makes it the “best” lattice in many ways. (For exam- 
ple, bees arrange their hives in hexagonal lattices, soap 
bubbles of similar sizes naturally organize themselves 
into hexagonal lattices, and so on.) The Leech lattice 
plays a similar role in twenty-four dimensions: it is the 
“most symmetrical” of all twenty-four-dimensional lat- 
tices, with a degree of symmetry that is quite extraor- 
dinary. It is discussed in more detail in the general 
GOALS OF MATHEMATICAL RESEARCH [1.4 §4]. 


III.49 L-Functions 

Kevin Buzzard 


1 How Can We “Package” 
a Sequence of Numbers? 

Suppose we are given a sequence of numbers such as 
tt, 42, 6.02 3 x 10 23 , .... 

How can we package up this sequence into one object 
that remembers everything about the sequence, and 
that might even give us new insights into the sequence? 
One standard technique is to use a generating func- 
tion [III.32], but here is another way, which has proved 
very fruitful in number theory and elsewhere. Given a 
sequence ai, az, as , . . . , we define the Dirichlet series 



= X «n/n s . 
1 


Here, 5 could be a positive integer, or a real number, for 
example. As long as our sequence ai,az,... does not 
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grow too quickly (which we shall henceforth assume), 
the series 1(5) will converge for all sufficiently large 
values of s. Moreover, it may be a very “rich” object, 
even if the initial sequence is simple. For example, if 
a n = 1 for all n, then the resulting function L(s) is 
the famous riemann zeta function [IV.4§3] £(s) = 
1 ~ A + 2~ s + 3~ s + ■ ■ ■ , which converges when 5 > 1 and 
was shown by Euler to satisfy the following identities, 
among others (there is one for each even number): 

£(2)=tt 2 /6, £(4) = tt 4 /90, 


C(12) = 


691tt 12 


638512875' 

Thus, even a sequence as simple as 1,1,1,... leads us 
to some natural questions that cry out to be answered. 

The zeta function is the prototypical example of an 
L-function. However, not every Dirichlet series deserves 
to be called an I-function. We will mention below some 
“good” properties that the zeta function has: roughly 
speaking, a Dirichlet series is considered to be an 
I-function if it has these good properties. This is not 
a formal definition of course, but in faet there is no 
formal definition of “an I-function.” (People have tried 
to give one, but there is no real consensus about what 
the right definition should be.) What happens in prac- 
tice is that a mathematician finds a way of associating 
a sequence ai,a2,... of numbers with a mathemati- 
cal object X, and if evidence then emerges to suggest 
that the associated Dirichlet series 1(5) shares the good 
properties of the zeta function, then 1(5) will be called 
the I-function of X. 


2 What Good Properties Might I (s) Have? 

One can check that the zeta function can also be 
expressed as an infinite product over primes £(5) = 
n P d - p~ s )~ 1 . The product is usually referred to as 
an Euler product, and if a Dirichlet series is to deserve 
the title of I-function, then it should have some kind 
of analogous product expansion. The existence of such 
an expansion is closely related to, but a little stronger 
than, the property that the sequence ai, a.2, . . . should 
be multiplicative, which means that a mn = a m a n 
whenever m and n are coprime. 

To go further we must expand our horizons. It is not 
hard to show that our definition of 1(5) makes sense 
even when 5 is a complex number, as long as it has a 
sufficiently large real part. Moreover, it defines a holo- 
morphic function [1.3 §5.6] in the region of the com- 
plex plane where the sum converges. For example, the 
Dirichlet series defining the zeta function converges for 


every 5 such that Re(5) > 1. A standard faet about the 
zeta function is that it has a unique extension to a holo- 
morphic function of 5 for any complex number 5 * 1. 
This phenomenon is known as meromorphic continua- 
tion of the zeta function. It is similar to the faet that the 
infinite sum l+x + x 2 +x 3 + ■ ■ ■ converges only when 
|x| < 1 but, when rewritten as 1/(1 - x), has a natural 
interpretation for any complex number x other than 1. 
A meromorphic continuation is another of the proper- 
ties that one would expect of a general I-function. It is 
important to stress, however, that extending a Dirich- 
let series to a function on the whole complex plane is 
not a “purely formal” technique: for a random sequence 
a\, a2, ■ ■ . there is no reason at all for the associated 
Dirichlet series 1(5) to have a natural extension beyond 
the region where the series converges. The existence 
of a meromorphic continuation is somehow a rigorous 
way of asserting the existence of subtle symmetries in 
the series. 

While on the subject of meromorphic continuation, 
we should briefly mention the riemann hypothesis 
[V.33], a conjecture which States that, once one has 
extended £(5) to a function on the whole complex 
plane, the complex numbers 5 such that 0 < Re(5) < 1 
and £(5) = 0 all have real part equal to \ . There are 
analogous Riemann hypotheses for many I-functions, 
almost all of which are open problems. 

The final property we shall emphasize is that there is 
a relatively simple formula relating £(5) and £(1 - 5). 
This relation is called the funetional equation of the 
zeta function, and any Dirichlet series worthy of the 
name I-function should also have an analogous prop- 
erty. (In general one looks for a relation between 1(5) 
and L(k - s), where k is some real number and 1(5) 
is the Dirichlet series associated with the series of 
complex conjugates al, W2 , ) 

There are many examples of Dirichlet series arising 
in number theory that do have, or are at least conjec- 
tured to have, these three key properties: an Euler prod- 
uct, meromorphic continuation, and a funetional equa- 
tion. These are the Dirichlet series that have come to be 
known as I-functions. For example, if A and B are inte- 
gers such that the three roots of the cubic polynomial 
x 3 +Ax +B are distinet, then the equation 

y 2 =x 3 +Ax + B (1) 

defines an elliptic curve [III.21], and there is a 
natural sequence ai,az,... associated with it (a n is 
related to the number of solutions of (1) modulo n, 
at least when n is prime — see arithmetic geometry 
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[IV.6 §5.1] for more details). However, it was an open 
problem for years to establish the existence of a mero- 
morphic continuation of the associated Dirichlet series 
1(5) to the complex plane: it is now known to exist (and 
indeed to have no poles) as a consequence of the work 
of Wiles, Taylor, and others that grew out of the proof 
of fermat’s last theorem [V.12]. 

3 What Is the Point of L-Functions? 

One of the first uses of L-functions was by dirich- 
let [VI.36] himself, who used them to prove that there 
are infinitely many primes in a general arithmetic pro- 
gression (see ANALYTIC NUMBER THEORY [IV.4 §4]). In 
faet, although the Riemann hypothesis is still an open 
problem, even partial results about the locations of 
the zeros of the Riemann zeta funetion have deep 
consequences in the theory of distribution of prime 
numbers. 

However, over the last hundred years mathemati- 
cians have realized a second use for them: if X is 
a mathematical object and L(s) is its associated L- 
funetion, then there are deep conjectures relating the 
arithmetic of X to the values that 1(5) assumes, typi- 
cally at points where the Dirichlet series defming 1(5) 
does not converge! Hence, one can investigate X by 
investigating its I-function. One basic example of this 
phenomenon is the birch-swinnerton-dyer conjec- 
ture [V.4], a weak form of which States that the I- 
funetion associated with equation (1) should vanish at 
5 = 1 if and only if (1) has infmitely many solutions 
such that both x and y are rational numbers. Much is 
known about this conjecture, and it has been vastly gen- 
eralized by work of Deligne, Belinson, Bloch, and Kato. 
However, at the time of writing it remains open. 


III.50 Lie Theory 

Mark Ronan 


1 Lie Groups 

Why are groups important in mathematics? One major 
reason is that it is often possible to understand a math- 
ematical structure by understanding its symmetries, 
and the symmetries of a given mathematical structure 
form a group. Some mathematical structures are so 
symmetrical that they have not just a finite number 
of symmetries, but a continuous family of them. When 
this is the case, we find ourselves in the realms of Lie 
groups and Lie theory. 


One of the simplest “continuous” groups is the group 
SO (2), which consists of all rotations of the plane R 2 
about the origin. With each element of SO(2) one can 
associate an angle 9: the angle of the rotation in ques- 
tion. If we write Rg for the counterclockwise rotation 
by 9, then the group operation is given by RgR v = 
Rg+tp, where R2 tt is understood to equal Ro , the identity 
element of the group. 

The group SO(2) is not just a continuous group, but 
also a Lie group. Roughly speaking, this means that it is 
a group in which one can meaningfully define the con- 
cept of a smooth curve (that is, a curve that is not just 
continuous but differentiable as well). Given any two 
elements Rg and R v of SO(2), one can easily define a 
smooth path from Rg to Rq> by smoothly modifying 0 
until it becomes ep. (The most obvious such path would 
be given in parametric form by Rp-t)ø+tq>, as t goes 
from 0 to 1.) It is not always the case that every pair of 
points in a Lie group can be connected by a path: when 
they can, the Lie group is said to be connected. An exam- 
ple of a lie group that is not connected is 0(2), which 
consists of SO (2) together with all reflections of the 
plane about lines through the origin. Any two rotations 
can be linked by a path, as can any two reflections, but 
there is no continuous way of changing a rotation into 
a reflection. 

Lie groups were introduced by sqphus 
order to create an analogue of galois theory [V.24] 
for differential equations. Lie groups that consist of 
invertible linear transformations of R™ or C n , like the 
examples above, are called linear Lie groups, and they 
are an important subelass. For linear Lie groups it is 
fairly easy to work out what terms such as “continu- 
ous,” “differentiable,” or “smooth” should mean. How- 
ever, one can also consider more abstract Lie groups 
(both real and complex), with elements that are not 
given as linear transformations. In order to give a 
proper definition of lie groups in their full generality, 
one needs the concept of a smooth manifold [1.3 §6.9]. 
However, for simplicity we shall mostly restrict atten- 
tion to linear Lie groups. 

A very common way to create a Lie group is to 
collect all transformations of a given space that pre- 
serve one or more specified geometric structures. For 
instance, the general linear group GL n (R) is defined to 
be the group of all invertible linear transformations 
from R n to R n . Inside this group is the special lin- 
ear group SL n (R), in which we retain only those linear 
transformations that preserve volume and orientation 
(or equivalently those with determinant [III.15] equal 
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to 1). If instead we retain the linear transformations 
that preserve distance, then we obtain the orthogonal 
group O (n); if we retain linear transformations that 
preserve both distance and orientation we obtain the 
special orthogonal group SO (n), which is easily seen 
to equal SL n (R) n O(n). The Euclidean group E(n) of 
rigid motions of R n (that is, all transformations that 
preserve distances and angles, such as rotations, reflec- 
tions, and translations) is generated by the orthogonal 
group O(n), together with the group of translations 
(which is isomorphic to R"). There are analogues of 
all of the above groups in which the real numbers R 
are replaced by the complex numbers €. For instance, 
GL n (C) is the group of all invertible complex-linear 
transformations of C n , and the complex analogue of 
the orthogonal group O(n) is the unitary group U(n). 
There are also the symplectic groups Sp(2n), which are 
analogues of O(n) and U(n) over the quaternions 
[III.78]. These are all manifestly linear Lie groups except 
for E(n) , and in faet it is not difficult to describe a linear 
Lie group that is isomorphic to E(n) as well. 

Many important examples of Lie groups are finite 
dimensional, which roughly means that they can be 
described using a finite number of continuous parame- 
ters. (Infinite-dimensional Lie groups, while important, 
are more difficult to handle and will not be discussed 
in detail here.) For example, the group SO(3), of rota- 
tions of R 3 that fix the origin, is three dimensional. Each 
rotation can be specified using three parameters, which 
could, for instance, be taken as rotations around the 
x-axis, y-axis, and z-axis. These particular parameters 
are known to airline pilots as roli, pitch, and yaw, where 
the x-axis is in the direction of the airplane. Another 
way of specifying each rotation is by its axis and angle 
of rotation. Two parameters are needed to specify the 
axis (using spherical coordinates for example), and one 
parameter is needed to specify the angle of rotation. 
Let us take this angle to be between 0 and tt (a rota- 
tion by an angle greater than tt has the same effeet as 
a rotation by an angle less than tt from the opposite 
direction). 

We can represent SO(3) geometrically as follows. Let 
B be a ball of radius tt centered at the origin. Given any 
noncentral point P of B, associate with it the rotation 
of R 3 about the axis OP (where O is the origin) through 
an angle that is given in radians by the distance from 
O to P. With O itself we associate the identity map, so 
the only ambiguity is that a rotation through tt radi- 
ans is associated with two opposite points P and P' on 
the surface of B. We can remove this ambiguity by glu- 


ing all such pairs of points together. This tells us what 
SO(3) looks like as a topological space [III.92J: it is 
equivalent to the three-dimensional projective space 
[1.3 §6.7] RP 3 . The group SO(2), by comparison, is mueh 
simpler, and is topologically equivalent to a circle. 

Lie groups arise naturally in any subject that involves 
continuous motion. For instance, they appear in ap- 
plied topics such as the design of guidance systems 
and also in very pure topics such as geometry or diff er- 
ential equations. Lie groups, and the closely related Lie 
algebras discussed below, also frequently arise in many 
types of algebra, particularly in the algebraic structures 
that appear in quantum mechanics and other related 
branches of physics. 

2 Lie Algebras 

As the examples above show, Lie groups are often 
“curved” and have some nontrivial topology. However, 
one can profitably analyze a Lie group by associating 
with it a flat space known as a Lie algebra. This idea 
is similar to the idea of studying a symmetric object 
such as a sphere by first studying its relationship to 
one of its tangent planes. The Lie algebra uses the tan- 
gent space to the Lie group at the identity element, and 
one can view it as a “logarithm” of the Lie group. 

To see how Lie algebras arise, let us consider a lin- 
ear Lie group. The elements of the group can be viewed 
as linear transformations on a vector space, or equiv- 
alently (when we have selected a coordinate basis) as 
square matrices. In general, two matrices A and B do 
not commute (that is, AB does not have to equal BA), 
but the situation becomes simpler if one looks at matri- 
ces that are very close to the identity matrix I. If A = 
I + eX and B = I + eY for some very small positive e 
and two fixed matrices X and Y, then 

AB = I + e(X + Y)+e 2 XY 

and 

BA = I + e(X+Y) + e 2 YX. 

Thus, if we ignore the terms containing e 2 , we see 
that A and B “almost commute,” and that multiplica- 
tion of A and B “almost corresponds to” addition of X 
and Y: indeed, one can view X and Y as analogous to 
“logarithms” of A and B respectively. 

Let us now informally define the Lie algebra g of a lin- 
ear Lie group G to be the space of all matrices X such 
that, for sufficiently small e, the matrix I + eX lies in 
G, up to errors of size e 2 . For example, the Lie algebra 
ø[„ (C) of the general linear group GL n (C) is the space of 
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all n x n complex matrices. One can view the Lie alge- 
bra as describing all possible instantaneous directions 
and speeds within the group G, and a more precise def- 
inition is the collection of all derivatives R' 0 of smooth 
curves e — R e in G that pass through the identity ele- 
ment R 0 . This definition can also be extended to more 
abstract Lie groups without much difficulty. (To return 
to the example of the airplane pilot, an element of the 
Lie group SO(3) could be used to describe the current 
orientation of the aircraft with respect to a fixed coor- 
dinate system, whereas an element of the lie algebra 
so (3) could be used to describe the current rate of roli, 
pitch, and yaw that the pilot is applying to the aircraft 
to smoothly change its orientation.) 

As we have just seen, the Lie algebra ø( ra (C) of the 
general linear group GL n (C) is the space of all n x n 
complex matrices. The Lie algebra sl n (C) of the special 
linear group SL n (C) is the subspace of all matrices with 
trace zero. This is because det(L + eX) = 1 + etr X, up 
to errors of size e 2 , so if e — I + eX is a path in the 
group, then tr X = 0. The Lie algebra so(n) of SO(n) is 
equal to the Lie algebra o (n) of O (n) , and both are equal 
to the space of all antisymmetric matrices. Similarly, 
both the Lie algebra su(n) of SU(n) and the Lie algebra 
u(n) of U(n) are equal to the space of skew-Hermitian 
matrices. (A matrix is skew-Hermitian if it equals minus 
the complex conjugate of its transpose.) 

The faet that a Lie group is closed under multiplica- 
tion can be used to show that its Lie algebra is closed 
under addition. Thus, a Lie algebra is a (real) vector 
space. However, it has some additional structure that 
makes it far more than just a vector space. For instance, 
let A and B be two elements of the Lie group G that are 
very close to the identity. Then we can write A » I + eX 
and B » I + eY for some very small e and some ele- 
ments X and Y of the Lie algebra ø. A little matrix alge- 
bra shows that the commutator ABA~ 1 B~ 1 of A and B, 
which is the element of G that measures the extern to 
which A and B fail to commute, can be approximated 
by I + e 2 [X, Y ], where [X, Y] = XY — YX. This quantity 
[X, 7] is called the Lie bracket of X and Y. Informally, it 
represents the net direction of motion if one first moves 
an infmitesimal amount in the X direction, then in the 
T direction, then back in the X direction and back in 
the y direction, in that order. The resulting new direc- 
tion may be quite different from the original directions 
X and y. 

The Lie bracket obeys a number of nice identities, 
such as the antisymmetric identity [X, y] = — [Y,X] 


and the Jacobi identity 

[[X, y],Z] + [[Y,Z],X] + [[Z,X], Y] = 0. 

One can in faet use such identities to define Lie algebras 
in a completely abstract fashion, without any reference 
to matrices or Lie groups, in much the same way that 
other algebraic objects such as groups, rings, and helds 
can be dehned using a handful of algebraic identities as 
axioms, but we shall not focus on the abstract approach 
to Lie algebras here. A familiar example of a Lie alge- 
bra is R 3 with the Lie bracket [x,y] dehned to be the 
cross-product x x y. Notice that the Lie bracket does 
not satisfy the associative law (unless it is trivial). 

We have seen that a linear lie group G naturally gen- 
erates the bracket operation [ ■ , ■ ] on its Lie algebra 
ø. Conversely, if the lie group is connected, one can 
almost reconstruct it from the Lie algebra, with its addi- 
tion, scalar multiplication, and Lie bracket operation. 
More precisely, every element A of the Lie group can 
be written as an exponential [III.25] exp(A) of an ele- 
ment X of the lie algebra. For example, if the Lie group 
is SO ( 2 ) , then we can identify it with the unit circle in C. 
The tangent to this circle at 1 is a vertical line, so we can 
identify the lie algebra with the set iR of purely imag- 
inary numbers. (Normally, however, we would just say 
that the Lie algebra is R.) The rotation through an angle 
9 can then be written as exp(id). Note that this repre- 
sentationisnotunique, since exp(id) = exp(i(d + 2rr)). 
It is not hard to see that the Lie group R also has R as its 
Lie algebra (to make sense of this it helps to replace R 
by the multiplicative group of positive real numbers, 
which is isomorphic to R), and that in this case the 
representation of a group element as an exponential 
is unique. In general, if two connected Lie groups have 
the same lie algebra, then those Lie groups share the 
same universal cover, and are therefore closely related 
to one another. 

In the case of linear Lie groups, the exponential can 
be described by the familiar formula 

exp(X) = Um ^ I + ^ X j . 

For more abstract Lie groups, the exponential is best 
described in the language of ordinary differential equa- 
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tions, 1 using a suitable generalization of the identity 

A e « = xe tx 

from single-variable calculus. However, owing to the 
noncommutativity of the Lie group, it is not quite 
true that exp(X + Y) equals exp(X) exp(Y); instead, 
the correct identity is the Baker-Campbell-Hausdorff 
formula 

exp(A) exp(Y) = exp(X + Y) + ±[X, Y] + ■ ■ ■ , 
where the missing terms consist of a moderately com- 
plicated infimte series involving the Lie bracket. The 
exponential map that connects Lie algebras and Lie 
groups is closely related to the Lie bracket, and because 
of this it is possible to study and classify Lie groups by 
first studying and classifying Lie algebras with their Lie 
bracket operation. 

3 Classification 

It is always of interest when a mathematical structure 
can be classified, but especially so if the structure is 
important and the classification is not straightforward. 
By these criteria, the results that have been obtained 
concerning the classification of Lie algebras are unde- 
niably interesting, and they are regarded as one of the 
great mathematical achievements from around the turn 
of the twentieth century. 

It turns out to be easier to classify complex Lie alge- 
bras: that is, Lie algebras such as s( n (C) that have 
the structure of a complex vector space. Each real Lie 
algebra embeds in a complex Lie algebra of twice the 
(real) dimension, known as the complexification of the 
original algebra. However, a complex Lie algebra may 
arise as the complexification of several different real 
Lie algebras (known as real forms of the complex Lie 
algebra). 

In classifying Lie groups and Lie algebras, the first 
step is to restrict attention to simple Lie groups and Lie 
algebras; these are analogous to prime numbers in the 
sense that they cannot be “factored” into smaller com- 
ponents. For instance, the Euclidean group E(n) con- 
tains the translation group M n as a connected normal 
subgroup. If we factor out this group, then we obtain 
the orthogonal group O(n), so E(n) is not simple. More 


1. Indeed, Lie groups and Lie algebras are an excellent tool for 
describing the algebraic aspects of ordinary and partial differential 
equations; the evolution of such equations through time can be mod- 
eled using a Lie group, and the differential operators used to describe 
an equation can be modeled on the assoclated Lie algebra. However, 
we will not dlscuss this important connection between Lie theory and 
differential equations here. 


formally, a Lie group is simple if it contains no proper 
connected normal subgroups, and a Lie algebra is sim- 
ple if it contains no proper ideals [III.83 §2]. In this 
sense, the Lie group SL n (C) and its Lie algebra sin (C) 
are simple for every n. Finite-dimensional, complex, 
simple Lie algebras were classified by Wilhelm Killing 
and Élie cartan [VI.69] in 1888-94. 

This classification is often placed in the context of so- 
called semisimple Lie algebras, which can be factored in 
a unique way (up to rearrangement) as a direct sum of 
simple Lie algebras, just as a natural number can be 
factored uniquely as a product of prime numbers. Fur- 
thermore, a theorem of Levi shows that a general flnite- 
dimensional Lie algebra g can be expressed as a combi- 
nation (or, more precisely, a “semidirect product”) of a 
semisimple algebra (called a Levi subalgebra of g) and 
a solvable subalgebra (known as the radical of g). Solv- 
able Lie algebras, which are related to the concept of a 
solvable group [V.24] in group theory, are difficult to 
classify, but in many applications one can restrict atten- 
tion to semisimple Lie algebras, and hence to simple Lie 
algebras. 

A simple Lie algebra g splits into smaller subalge- 
bras, which are not ideals but which are related to one 
another in particularly nice ways. The case of sl n +i 
is typical and we shall use it to explain the general 
theory. It comprises all (n + 1) x (n + 1) matrices of 
trace zero, and splits as a direct sum in the following 
way: 

where f) is the set of diagonal matrices of trace zero, 
and n+ and ti_ are, respectively, the sets of upper and 
lower triangular matrices with Os on the diagonal. Two 
diagonal matrices X and Y commute with one another, 
so their Lie bracket [X, Y ] = XY - YX is 0. In other 
words, if X and Y belong to fj, then [X, Y] = 0. A Lie 
algebra in which [X, Y] = 0 for any two elements X and 
Y is called Abelian. 

Each simple Lie algebra g has a similar decomposi- 
tion where the subspace I) is a maximal Abelian subal- 
gebra called a Cartan subalgebra. (For Lie algebras that 
are not simple, the definition of Cartan subalgebras is 
more complicated.) Cartan subalgebras are important 
because their action on the rest of the Lie algebra can be 
simultaneously diagonalized. What this means is that a 
complement to f) can be split up into one-dimensional 
components g«, known as root spaces, that are invari- 
ant under the action of f). To put this another way, if X 
belongs to ly and Y belongs to a root space, then [X, Y] 
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is a scalar multiple of Y. (The diagonalization requires 

THE FUNDAMENTAL THEOREM OF ALGEBRA [V. 1 5], which 

is why we need to work with complex Lie algebras.) 

For sln+i this works as follows. Each root space øy is 
the one-dimensional space of matrices whose entries 
are 0 except for a single entry in the ith row and yth 
column. If X g f; (that is, if X is a diagonal matrix of 
trace zero) and Y g øy, then it is not hard to check 
that [X, Y] also lies in øy. In faet, 

[X, Y] = (Xh ~ Xjj)Y. 

If we identify the diagonal matrix X with the vector 
whose n coordinates appear down its diagonal, and 
if we write e* for the vector that is 1 in the ith posi- 
tion and 0 elsewhere, then Xu - Xjj can be rewritten 
as (ei - ej,X). We refer to the vectors et - ej as root 
vectors. 

In general, a complex semisimple Lie algebra ø can 
be completely described by its root vectors ot and cor- 
responding root spaces ø«. The rank of ø equals the 
dimension of the Cartan subalgebra fy and also equals 
the dimension of the vector space spanned by the root 
vectors. For example, sl n + 1 has rank n, and its root vec- 
tors are the vectors e; - ej, as we have just seen. Sets 
of root vectors are far from arbitrary: they must obey 
some simple but quite restrictive geometric properties. 
For instance, if a root vector ot is reflected in the hyper- 
plane perpendicular to another root vector /?, the result 
must be a third root vector sø(a), where sp is the reflec- 
tion concerned. (To make the notion of “perpendicular” 
precise, one needs to define a special inner product on 
the Cartan subalgebra, known as the Killing form, but 
we shall not discuss this here.) The group generated 
by these reflections is called the Weyl group of the Lie 
algebra. 

The root vectors form what is called a root system, 
and the geometric properties mentioned above allow 
one to classify all root systems, and hence all complex 
semisimple Lie algebras. This classification is given by 
some very simple diagrams called Dynkin diagrams, 
which are shown in figure 1. 

The nodes of the diagram correspond to so-called 
simple roots. Every root is a linear combination of sim- 
ple roots with coefficients that are either all nonnega- 
tive or all nonpositive. The nature of the bond (or lack 
thereof) between two nodes determines the inner prod- 
uct of the corresponding simple roots. If there is no 
bond, then the inner product is 0; if there is a single 
bond, then the root vectors have the same length and 
the angle between them is 120°. In diagrams that have 


A n I 1 \ 1 1 

B n | 1 1 k=>=l 


C n \ 1 1 t=r^l 



Figure 1 Dynkin diagrams. 


only single bonds, the root vectors spån a set of lines 
in R" in which the angle between any two lines is either 
90° or 60°. In the diagrams B n , C n , F4, and G2 there are 
arrows between certain pairs of nodes. The direction of 
an arrow is from a long root to a short root: the ratio of 
the root lengths is ~J2 in the first three cases and y'3 in 
the case of G2. In these cases there are exaetly two root 
lengths, but in the single-bond cases all roots have the 
same length. 

The A n diagram is the one for sl n+ i . The simple roots 
are ej - ej+ 1 f or 1 < i < n, going from lef t to right on the 
diagram. Notice that the inner product of two simple 
roots is 0 unless they are adjacent on the diagram, in 
which case it is -1. Each root et - ej is a sum of simple 
roots with coefficients all 1 or all -1 on a connected 
segment of the diagram. 

The four infinite families A n B n , C n , and D n corre- 
spond to the classical Lie algebras, of which si n+ i(R), 
so(2n + 1), sp(2n), and so(2n) are real forms. These 
are the algebras associated with the classical Lie groups 
SLw(R) , SO(2n + 1), Sp(2n), and SO(2n), respectively. 

As mentioned earlier, a simple Lie algebra ø of rank n 
decomposes as the direct sum of a Cartan subalgebra of 
dimension n plus a set of one-dimensional root spaces, 
one for each root. It follows that 

dimø = the rank of ø + the number of roots. 

Here are the dimensions of the simple Lie algebras: 
dimA ra = n + n(n+ 1) = n(n+ 2), 
dimB n = n + 2n 2 = n(2n + 1), 
dimCn = n + 2n 2 = n(2n + 1), 
dimD n = n + 2n(n - 1) = n(2n - 1), 
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dimG 2 = 2 + 12 = 14, 
dimF 4 = 4 + 48 = 52, 
dim£ 6 = 6 + 72 = 78, 
dim£> = 7+ 126 = 133, 
dim Fg = 8 + 240 = 248. 

Each node of the diagram corresponds to a simple 
root, and hence to a reflection across the hyperplane 
perpendicular to that root. This set of reflections gen- 
erates the Weyl group W in a particularly elegant way. If 
Si denotes the reflection corresponding to node i, then 
W is generated by elements s i of order 2, subject only 
to the relations 

(SiSj) m ‘J = 1, 

where my is the order of SiSj (see [IV.ll §2] for a dis- 
cussion of generators and relations). These orders are 
determined by the diagram according to the following 
rules: 

(i) SiSj has order 2 if there is no bond; 

(ii) SiSj has order 3 if there is a single bond; 

(iii) SiSj has order 4 if there is a double bond; and 

(iv) SiSj has order 6 if there is a triple bond. 

For example, the Weyl group of type A n is isomor- 
phic to the symmetric group [III.70] S n +i, and one 
can take si , . . . ,% to be the transpositions (1 2), (2 3), 
. . . , (n n + 1). Notice that the Dynkin diagrams for the 
B n and C„ root systems yield the same Weyl group. 

In principle, this classification of root systems leads 
to a classification of all semisimple fmite-dimensional 
Lie algebras and Lie groups. However, there are many 
fundamental questions about simple Lie algebras and 
Lie groups that remam only partly understood. For 
instance, one particularly important aim of Lie theory is 
to understand the linear representations of a given Lie 
group or Lie algebra; roughly speaking, a linear repre- 
sentation is a way of interpreting an abstract Lie group 
or Lie algebra as a linear Lie group or Lie algebra by 
assigning a matrix to each of its elements. While the 
representations of all the simple Lie algebras and Lie 
groups have been classified and described explicitly, 
these descriptions are not always easy to work with, 
and answering basic questions (such as how a given rep- 
resentation decomposes into simpler representations) 
often requires some sophisticated tools from algebraic 
combinatorics. 

The theory of root systems outlined above can also be 
extended to an important class of infinite-dimensional 
Lie algebras, namely the Kac-Moody algebras. Such 


algebras arise in several areas of physics (such as are 
described in vertex operator algebras [IV. 13]) and 
algebraic combinatorics. 


III. 5 1 Linear and Nonlinear Waves and 
Solitons 

Richard S. Palais 


1 John Scott Russell and the 
Great Wave of Translation 

To the world at large, John Scott Russell is known 
as the naval architect who designed The Great East- 
ern, a steamship larger than any built before. But long 
after The Great Eastern has been forgotten, Russell will 
be remembered by mathematicians as the man who, 
despite limited mathematical training and background, 
was the frrst person to recognize the highly impor- 
tant mathematical concept known as a soliton, which 
he referred to as “the great wave of translation.” Here 
is his oft-quoted passage in which he describes how he 
first became acquainted with it: 

I was observing the motion of a boat which was rapidly 
drawn along a narrow channel by a pair of horses, when 
the boat suddenly stopped— not so the mass of water 
in the channel which it had put in motion; it accumu- 
lated round the prow of the vessel in a State of violent 
agitation, then suddenly leaving it behind, rolied for- 
ward with great velocity, assuming the form of a large 
solitary elevation, a rounded, smooth and well-defmed 
heap of water, which continued its course along the 
channel apparently without change of form or diminu- 
tion of speed. I followed it on horseback, and over- 
took it still rolling on at a rate of some eight or nine 
miles an hour, preserving its original figure some thirty 
feet long and a foot to a foot and a half in height. Its 
height gradually diminished, and after a chase of one 
or two miles I lost it in the windings of the channel. 
Such, in the month of August 1834, was my first chance 
interview with that singular and beautiful phenomenon 
which I have called the Wave of Translation. 

Russell (1844) 

You may feel that there is nothing unusual about what 
Russell describes here, and indeed many before and 
since have watched this same scenario play out with- 
out noticing anything out of the ordinary. But Russell 
was very familiar with wave phenomena and had a sci- 
entist’s keenly observant eye. What struck him was the 
remarkable stability of the bow wave as it traveled over 
a long distance. He knew that if one tried to create 
a traveling water wave on, say, a calm lake, it would 
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soon disperse into a train of smaller wavelets— it would 
not just go marching along as a single “heap” over a 
long distance. There was clearly something very special 
about water waves traveling in a narrow and shallow 
channel. 

Russell became fascinated— even a little obsessed— 
with his discovery. He built a wave tank behind his 
home and proceeded to do extensive experiments, 
recording the results as data and sketches in his note- 
books. He found, for example, that the speed of a soli- 
ton depended on its height, and he was even able to dis- 
cover the correct formula for the speed as a function 
of height. More surprising still, in Russell’ s notebooks 
one finds remarkable sketches of a two-soliton inter- 
action— something that would evoke amazement when 
it was rediscovered as a rigorous solution to the KdV 
equation (see section 3 below) more than a hundred 
years later. 

However, as we shall see, solitons are very much a 
nonlinear phenomenon, and when some of the best 
mathematicians of Russell’ s day, notably Stokes and 
Airy, tried to understand Russell’s observations using 
the linearized theory of water waves that was then 
available, they failed to find any trace of soliton-like 
behavior and expressed doubts that what Russell had 
seen was real. 

It was only af ter Russell’s death, with the more 
sophisticated nonlinear mathematical treatment by 
Boussinesq in 1871 and by Korteweg and de Vries in 
1895, that Russell’s careful observations and experi- 
ments were at last seen to be in complete agreement 
with mathematical theory. And it took another seventy 
years before the full importance of the great wave of 
translation was recognized, after which it became an 
object of intensive study for the rest of the twentieth 
century. 

2 The Korteweg-de Vries Equation 

Korteweg and de Vries were the first to derive 
the appropriate differential equation to describe the 
motion of a wave in a shallow channel. We can write 
their equation, usually called the KdV equation, in a 
succinct form as follows: 

Ut + uu x + S 2 U X xx = 0. 

Here, u is a function of two variables, x and t, which 
represent space and time, respectively. “Space” is one 
dimensional, so x is a real number, and u(x, t ) repre- 
sents the height of the wave at x at time t. The nota- 


tion Ut is shorthand for du/dt\ similarly, u x stands for 
du/dx and u XX x stands for d 3 u/dx 3 . 

This is an example of an evolution equation: if, for 
each t, we write u(t) for the function from R to R that 
takes x to u(x, t), then it describes how the function 
u(t) “evolves” over time. The Cauchy problem for an 
evolution equation is the problem of determining this 
evolution from knowledge of its initial value u(0). 

2.1 Some Model Equations 

To put the KdV equation into perspective, it is useful 
to think briefly about three other evolution equations. 
The first is the classic wave equation [1.3 §5.4] 

Utt - C 2 u xx = 0. 

To solve the Cauchy problem for this equation, we 
factor the wave operator ( d 2 /dt 2 ) - c 2 ( d 2 /dx 2 ) as a 
product (0/3 1) - c(3/3x))((3/3t) + c(3/3x)). Then 
we transform to so-called characteristic coordinates 
5 = x - ct, q = x + ct. The equation becomes 
d 2 u/d%dq = 0, which clearly has the general solu- 
tion w(5, q) = F(5) + G(q). Transforming back to 
“laboratory coordinates” x, t, the general solution is 
u(x,t) = F(x - ct) + G(x + ct). If the initial shape 
of the wave is u(x, 0) = Uo(x) and its initial velocity 
is ut(x,0) = v(x,0) = vo(x), then an easy algebraic 
computation gives the following very explicit formula: 

u(x,t) = l[uo(x - ct) + uo(x + ct)] 

which is known as “d’Alembert’s solution” of the 
Cauchy problem for the wave equation. 

Note the geometric interpretation in the important 
“plucked string” case, Vq = 0; the initial profile Uo 
breaks up into the sum of two “traveling waves,” both 
with the same profile \uq, one traveling to the right, 
and the other to the left, both with speed c. It is an 
easy exercise to derive d’Alembert’s solution using the 
following hint: since tto(x) = F(x) + G(x), u' 0 (x) = 
F'(x) + G'(x), while voM = u t (x, 0) = -cF'(x) + 
cG'(x). 

The next equation to think about is 

u t = -u xxx , (1) 

which we can obtain from the KdV equation if we drop 
the nonlinear term uu x - This equation is not just linear 
but also translation invariant (meaning that if u(x,t) 
is a solution, then so is u(x - Xo, t - to) for any con- 
stants xo and to). Such equations can be solved using 
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THE FOURIER TRANSFORM [III.27], Let us try to find a 
“plane-wave” solution of the form u(x,t) = e ukx - wt > . If 
we substitute this into (1), then we obtain the equation 
-icoe l(kx_a,t) = ifc 3 e l(kx_a)t) , 
and therefore the simple algebraic equation w + fe 3 =0. 
This is called the dispersion relation of (1): with the 
help of the Fourier transform it is not hard to show 
that every solution is a superposition of solutions of 
the form e 1(kx ~ a,t) , and the dispersion relation tells us 
how the “wave number” k is related to the “angular 
frequency” co in each of these elementary solutions. 

The function e' ,kx ~ a,t) represents a wave that trav- 
els at a speed of w/k, which we have just shown to be 
equal to -k 2 . Therefore, the different plane-wave com- 
ponents of the solution travel at different speeds: the 
higher the angular frequency, the greater the speed. For 
this reason, the equation (1) is called dispersive. 

What happens if instead we omit the u xxx term from 
the KdV equation? Then we obtain the inviscid Burgers 
equation 

u t + uu x = 0. (2) 

The term itu* canbe rewrittenas (d/dx)( \u 2 ). Letus 
consider the integral JT« u(x, t) dx, which is a function 
of t. The derivative of this function is Ut dx, which 
equation (2) tells us is equal to 


which equals [~\u(x, t) 2 ]”«,. Therefore, if \u(x, t) 2 
vanishes at infmity, then the original expression 
J“ æ u(x, t) dx is a “constant of the motion.” We say 
that the inviscid Burgers equation is a conservation law. 
(The argument we have just used can be used for any 
equation of the form Ut = (F(u)) x , where F is a smooth 
function of u and its partial derivatives with respect to 
x. This is known as the general conservation law. For 
example, taking F(u) = ~(\u 2 + S 2 u xx ) gives rise to 
the KdV equation.) 

The inviscid Burgers equation (and other conserva- 
tion laws where F is a function just of u ) can be solved 
using the method of characteristics. The idea of this 
method is to look for smooth curves (x(s), t(s)) in the 
xt-plane along which the solution to the Cauchy prob- 
lem is constant. Suppose that so is such that t(5o) = 0, 
and write xo for x(so). Then the constant value that 
the solution u(x, t) will have to take along this curve is 
u(x o,0), which we also write as Uo(xq). The deriva- 
tive of u along this so-called characteristic curve is 
(d/ds)u(x(s), t(s)) = u x x' + utt', so if we want the 
solution to be constant along the curve, then we need 
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this to be 0. Therefore, using the faet that ut = -uu x , 
we find that 

fif = JiTY = = u ( x ( s )’t(s)) = uo(x 0 ), 

dt t'(s) u x 

so the characteristic curve is a straight line of slope 
Uo(xo). In other words, u has the constant value 
uo(xo) along the line x = xo + u 0 (x 0 )t. 

Note the following geometric interpretation of this 
last result: to find the wave profile at time t (i.e., the 
graph of the map x — u(x,t)), we translate each 
point (x, tto(x)) of the initial profile to the right by the 
amount i<o(x)t. Suppose we look at a portion of the 
initial profile where Uo is decreasing. Then the earlier, 
and higher, parts of the initial wave are translated at a 
greater speed (since u o(x) is larger), so that the nega- 
tive slope of the wave becomes more negative. Indeed, 
after a finite time the earlier part of the wave “catches 
up” with the later part, which means that we no longer 
have a graph of a function. The first time at which this 
sort of problem happens is called the “breaking time,” 
since one can visualize it as the breaking ofa wave. This 
process is usually referred to as shock formation, or 
steepening and breaking of the wave profile: once again, 
the phenomenon occurs for many other conservation 
laws. 

2.2 Split-Stepping 

Now let us return to the KdV equation itself, in the form 
u t = -uu x - u xxx . Why is it that this equation gives 
rise to the remarkable stability of the solutions that 
was observed experimentally by Russell? Intuitively, 
the reason is that there is a balance between the dis- 
persing effeet of the u xxx term and the shock-forming 
effeet of the uu x term. 

There turns out to be a very general technique 
for analyzing balances of this kind. In the pure- 
mathematics community it is usually called the Trot- 
ter product formula, while in the applied-mathematics 
and numerical-analysis communities it is called split- 
stepping. The rough idea is simple: as t inereases to 
t + At, you first change utou- u xxx At, as would be 
required by the equation Ut = -u xxx , and then you 
take a further step to u - u xxx At - uu x At, the small 
change required by the equation u t = -uu x . To work 
out the function u(t,x), you start at the initial function 
uo and take a succession of alternating small steps of 
this form. You then take the limit as the step size tends 
to zero. 

Split-stepping suggests a way to understand the 
mechanism by which dispersion from u xxx balances 
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shock formation from uu x in KdV. If we imagine the 
evolution of the wave profile as made up of a succession 
of pairs of small steps in this way, then when u, u x , and 
u X xx are not too large, the steepening mechanism will 
dominate. But as the time t approaches the breaking 
time Tb, u remains bounded (since it is made out of hor- 
izontally translated parts of Uo)- It is not hard to prove 
that the maximum slope (that is, the maximiun value of 
u x ) blows up like the function (Tb - t) -1 , while at the 
same place, u xxx blows up like the function (Tb - t)~ 5 . 
Thus, near the breaking time, and breaking point, the 
Uxxx term will dwarf the nonlinearity and will disperse 
the incipient shock. Thus, the stability is caused by a 
kind of negative feedback. Computer simulations show 
just such a scenario playing out. 

3 Solitons and Their Interactions 

We have just seen that the KdV equation expresses a 
balance between dispersion from its third-derivative 
term and the shock-forming tendency of its nonlin- 
ear term, and in faet many models of one-dimensional 
physical systems that exhibit mild dispersion and weak 
nonlinearity lead to KdV as the controlling equation at 
some level of approximation. 

In their 1894 paper, Korteweg and de Vries intro- 
duced the KdV equation and gave a convincing mathe- 
matical argument that this was the equation that gov- 
erned wave motion in a shallow canal. They also showed 
by explicit computation that it admitted traveling-wave 
solutions that had exaetly the properties that had been 
described by Russell, including the relation of height to 
speed that Russell had determined experimentally with 
the help of his wave tank. 

But it was only mueh later that further remarkable 
properties of the KdV equation became evident. In 
1954, Fermi, Pasta, and Ulam (FPU) used one of the very 
first digital computers to perform numerical exper- 
iments on an elastic string with a nonlinear restor- 
ing force, and their results contradicted the then cur- 
rent expectations of how energy should distribute itself 
among the normal modes of such a system. A decade 
later, Zabusky and Kruskal reexamined the FPU results 
in a famous paper in which they showed that the FPU 
string was well approximated by the KdV equation. 
They then did their own computer experiments, solv- 
ing the Cauchy problem for KdV with initial conditions 
corresponding to those used in the FPU experiments. 
In the results of these simulations they observed the 
first example of a “soliton,” a term that they coined 


to describe a remarkable particle-like behavior (elastic 
scattering) exhibited by certain KdV solutions. Zabusky 
and Kruskal showed how the coherence of solitons 
explained the anomalous results observed by Fermi, 
Pasta, and Ulam. But in solving that mystery they had 
uncovered a larger one: the behavior of KdV solitons 
was unlike anything seen before in applied mathemat- 
ics, and the search for an explanation of their remark- 
able behavior led to a series of discoveries that changed 
the course of applied mathematics for the next thirty 
years. We shall now fill in some of the mathemati- 
cal details behind the above sketch, beginning with a 
discussion of explicit solutions to the KdV equation. 

To find the traveling-wave solutions of KdV is 
straightforward. First, we substitute a traveling wave 
u(x, t ) = fix - ct) into KdV, obtaining the ordinary 
differential equation -cf + 6//' + /"'. If we add as 
a boundary condition that / should vanish at infin- 
ity, then a routine computation leads to the following 
two-parameter family of traveling-wave solutions: 

u(x, t ) = 2 a 2 sech 2 (a(x - 4 a 2 t + d )). 

These are the solitary waves seen by Russell, and they 
are now usually referred to as the I-soliton solutions of 
KdV. Note that their amplitude, 2a 2 , is just half their 
speed, 4a 2 , while their “width” is proportional to a _1 . 
Thus, taller solitary waves are thinner and move faster. 

Next, following Toda, we will “derive” 1 the 2-soliton 
solutions of KdV. Rewrite the 1 -soliton solution as 
u(x,t) = 2(3 2 /3x 2 )logcosh (a(x - 4 a 2 t + 5)), or 
u(x,t) = 2(3 2 /3x 2 ) logK(x, t), where K(x, t) = (1 + 
g2a(x-4a 2 t+6)y \y e now try t0 generalize, looking for 
solutions of the form u(x, t) = 2(3 2 /3x 2 )log K{x,t), 
with K(x,t) = 1 + A\e 2ni +Å2& 2rt2 + A3e 2( ' ?1+ ' ?2 \ where 
r\i = at(x - 4a 2 1 + di), and we shall choose the Ai and 
di by substituting into KdV and seeing what works. One 
can check that KdV is satisfied for u(x, t) of this form 
and arbitrary A ] , A2, au a 2 , d ] , d-z, provided that we 
defmeA 3 = ((a2 -ai)/(ai + a2)) 2 AiA 2 , and solutions 
of KdV arising in this way are called the KdV 2 -soliton 
solutions. 

It can now be shown that for these choices of ai and 
a 2 , 

3 + 4cosh(2x - 8 1) + cosh(4x - 64t) 

U X ' [cosh(3x - 36t) + 3 cosh(x - 28t)] 2 ' 

In particular, u(x, 0) = 6 sech 2 (x), u(x, t) isasymptot- 
ically equal to 2 sech 2 (x-4t-<£)+8 sech 2 (x-16t + ^<f>) 


1. This is a complete swindle! Only knowledge of the form of the 
solutions allows us to make the elever choice of K. 
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when t is large and negative, and u(x, t) is asymptoti- 
cally equal to 2 sech 2 (x - 4 1 + </> ) + 8 sech 2 (% - 1 6 1 - 1 </> ) 
when t is large and positive, where 4> = ^ log(3). 

Note what this says. If we follow the evolution from 
-T to T (where T is large and positive), we first see the 
superposition of two 1-solitons: a larger and thinner 
one to the left of, and catching up with, a shorter, fatter, 
and slower-moving one to the right. Around t = 0 they 
merge into a single lump (with the shape 6 sech 2 (x)), 
and then they separate again, with their original shapes 
restored— but now the taller and thinner one is to the 
right. It is almost as if they had passed right through 
each other. The only effect of their interaction is the 
pair of phase shifts: the slower one is retarded slightly 
from where it would have been, and the faster one is 
slightly ahead of where it would have been. Except for 
these phase shifts, the final result is what we might 
expect from a linear interaction. It is only if we look 
closely at the interaction as the two solitons meet that 
we can detect its highly nonlinear nature. (Note, for 
example, that at time t = 0, the maximum amplitude, 
6, of the combined wave is actually less than the max- 
imum amplitude, 8, of the taller wave when they are 
separated.) But of course the really striking faet is the 
resilience of the two individual solitons: their ability 
to put themselves back together after the collision. Not 
only is no energy radiated away, but their actual shapes 
are preserved. (Remarkably, Russell (1844, p. 384) gives 
a sketch of a 2 -soliton interaction experiment that he 
had carried out in his wave tank!) 

Now back to the computer experiment of Zabusky 
and Kruskal. For numerical reasons, they chose to deal 
with the case of periodic boundary conditions: in effect, 
studying the KdV equation Ut + uu x + 5 2 u XX x = 0 
(which they label (1)) on the circle instead of on the 
line. For their published report, they chose 5 = 0.022 
and used the initial condition u(x, 0) = cos(ttx). With 
the above background in mind, it is interesting to read 
the following extract from their 1965 report, which 
contains the first use of the term “soliton”: 

(I) Initially the first two terms of Eq. (1) dominate and 
the classical overtaking phenomenon occurs; that is u 
steepens in regions where it has negative slope. (II) Sec- 
ond, after u has steepened sufficiently, the third term 
becomes important and serves to prevent the forma- 
tion of a discontinuity. Instead, oscillations of small 
wavelength (of order S) develop on the left of the front. 
The amplitudes of the oscillations grow, and finally 
each oscillation achieves an almost steady amplitude 
(that inereases linearly from left to right) and has the 
shape of an individual solitary-wave of (1). (III) Finally, 


each “solitary wave pulse” or soliton begins to move 
uniformly at a rate (relative to the background value 
of u from which the pulse rises) which is linearly pro- 
portional to its amplitude. Thus, the solitons spread 
apart. Because of the periodicity, two or more solitons 
eventually overlap spatially and interact nonlinearly. 
Shortly after the interaction they reappear virtually 
unaffeeted in size or shape. In other words, solitons 
“pass through” one another without losing their iden- 
tity. Here we have a nonlinear physical process in which 
interacting localized pulses do not scatter irreversibly. 

Zabusky and Kruskal (1965) 
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III. 5 2 Linear Operators and Their 
Properties 


1 Some Examples of Linear Operators 

A linear map [1.3 §4.2] between vector spaces 
[1.3 §2.3] V and W is a funetion T :V — ■ W that satisfies 
the condition T(AiVi + A2U2) = \ 1 Tv 1 + Å 2 TV 2 - Two 
phrases that are used almost interchangeably with “lin- 
ear map” are “linear transformation” and “linear opera- 
tor.” The former is often used when one wishes to draw 
attention to the effect of a linear map on some other 
object; for example, one might well choose to use the 
word “transformation” to describe geometrical opera- 
tions such as reflections or rotations. As for “operator,” 
it tends to be the word of choice when the linear map is 
between infmite-dimensional spaces, especially when it 
is just one of an ensemble of linear maps that form an 
algebra. It is these maps that we shall discuss here. 

Let us begin with some examples of linear operators. 
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(i) If X is a banach space [III.64] whose elements are 
infmite sequences, then we can define a “shift” S from X 
to X, which takes the sequence (ai, a 2 , 0.3, ...) to the 
sequence (0, ai, a 2 , <13, . . . ). (In other words, it puts a 
0 at the beginning and shifts the other values of the 
sequence one place to the right.) The map S is linear, 
and if the norm on X is not too pathological, then S will 
be a continuous function from X to X. 

(ii) If X is a space of functions [III.29] defined on the 
closed interval [0,1] and w is some fixed function, then 
the map M that takes the function f to the product fw 
(which is shorthand for the function x -> f(x)w (x)) is 
linear, and, provided w is small enough in some appro- 
priate sense, M is a continuous linear map from X to X. 
Such maps are called multipliers. (Note that the prop- 
erty of “being a multiplier” depends not just on the 
space X and the map M but also on the way we choose 
to represent X as a space of functions, so it is not an 
intrinsic property of the map itself.) 

(iii) Another important way of defming linear operators 
on function spaces is to use a kerne 1. This is a function 
K of two variables, which can be used to define a linear 
map in a way that is similar to the way a matrix can be 
used to define a map between finite-dimensional vector 
spaces. The following formula uses K to define a linear 

?/(*) J K(x,y)f(y) dy. (1) 

Note the formal similarity between this and the formula 
(Av) i = Y. A ij v j> 

which defines the product of a matrix with a column 
vector. Once again, K will have to satisfy appropriate 
conditions in order for (1) to define a continuous linear 
map. 

A good example of a linear operator defined by a ker- 
nel is the fourier transform [III.2 7] J, which takes a 
function in L 2 (R) to another such function. It is defined 
by the formula 

(Tf)(a) = f(x)e~ iax dx. 

The kernel in this case is the function K(<x, x) = e~ lclx . 

(iv) If / is a differentiable function defined on R, say, 
and we write Df for its derivative, then we can think of 
D as a linear map, since D(A/ + pg) = AD/ + p Dg. In 
order to regard D as an operator, we need to require / 
to belong to a suitable function space. The hest way 
of doing this varies from context to context: choos- 
ing a good function space can be very important and 


can raise subtle questions. One way is not to insist 
that D is defined for every function in the space, but 
just on a dense set of functions, and not to require 
that D is continuous. Similarly, many partial differen- 
tial operators, such as the gradient [1.3 §5.3] and the 
laplacian [1.3 §5.4], are linear operators when viewed 
appropriately. 

2 Algebras of Operators 

Although individual operators can be important, linear 
operators would not be as interesting as they are if it 
were not for the faet that they can be formed into fam- 
ilies. If X is a Banach space, then the set B(X) of all 
continuous linear operators from X to itself forms a 
structure known as a Banach algebra. Roughly speak- 
ing, this means that it is a Banach space (the norm of 
an operator T is defined to be the supremum of ||Tx|| 
over all x such that ||xf| < 1) in which the elements can 
be multiplied as well as added. The product of Ti and 
T2 is defined to be the composition Ti T 2 , and it is easily 
seen to satisfy the inequality ||Tir 2 || ^ ||Til| ||T 2 ||. This 
algebra is particularly important when X is a hilbert 
space [III. 3 7] H: subalgebras of B(H) have a very rich 
structure, which is discussed in operator algebras 
[IV. 19]. 

3 Properties of Operators 

Defined on a Hilbert Space 

Unlike a general Banach space, a Hilbert space H has 
an inner product. It is therefore natural to ask that a 
continuous linear operator from H to H should relate 
to the inner product somehow. This basic idea leads to 
several different definitions, each of which picks out an 
important class of operators. 

3.1 Unitary and Orthogonal Maps 

Perhaps the most obvious condition one might require 
of an operator T is that it should preserve the inner 
product, in the sense that ( Tx.Ty ) should equal (x,y) 
for any two vectors x and y. In particular, this implies 
that || Tx || = ||x|| for every x, and therefore that T is 
an isometry (that is, a map that preserves distances). If 
in addition, T is invertible, which it will be if its image 
is the whole of H, then T is a unitary map. The uni- 
tary maps form a group. If H is n dimensional, then 
this group is an important lie group [III.50 §1] called 
U (n). If H is a real Hilbert space (as opposed to a com- 
plex one), then the word “orthogonal” is used instead 
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of “unitary” and the corresponding Lie group is called 
0(n). When n = 3, orthogonal maps are rotations and 
reflections, so O (n) is the generalization of the group 
of rotations and reflections to n dimensions. 

3.2 Hermitian and Self-Adjoint Maps 

Given any operator T from H to H, there is an opera- 
tor T* from H to H with the property that ( Tx,y ) = 
<x, T*y) for every x and y. This operator is unique, 
and it is called the adjoint of T. A second property 
that T can have is that of equaling its own adjoint, 
which is the case if and only if ( Tx,y ) = (x. Ty) for 
every x and y. Such operators are called Hermitian or, 
when the scalars are real, self-adjoint. A simple source 
of examples of Hermitian maps is multipliers on the 
space L 2 [0,1], where the function one multiplies by is 
bounded and real-valued. As we shall see in a moment, 
there is a sense in which these are the only examples. 

3.3 Properties of Matrices 

If H is a fimte-dimensional space with an orthonormal 
basis, then we can form the matrix A of T with respect 
to that basis. The various properties of T discussed 
above then turn out to be equivalent to properties of the 
matrix A. The transpose of A is the matrix A T defined by 
(A T )y = Aji, and the conjugate transpose is the matrix 
A* defined by (A T )y = Ajl. Ann xn matrix is unitary 
if AA* is the identity, orthogonal if it is real and AA T 
is the identity, Hermitian if A = A* , and self-adjoint if 
A = A J (in which case we say that A is symmetric). The 
operator T has one of these four properties if and only 
its matrix A has the corresponding property. 

3.4 The Spectral Theorem 

Notice that the adjoint of a unitary operator is the 
inverse of that operator. In particular, both unitary and 
Hermitian operators commute with their adjoints. An 
operator with this property is called normal. Normal 
operators are important because of the famous spec- 
tral theorem. If T is a normal operator on a finite- 
dimensional space H, then the spectral theorem asserts 
that H has an orthonormal basis [III. 3 7] of eigenvec- 
tors of T. In other words, there is a basis of H consisting 
of orthogonal unit vectors, with the property that the 
matrix of T with respect to this basis is diagonal. This 
is an extremely useful theorem in linear algebra. In gen- 
eral, if T is a normal operator on a Hilbert space H, then 
the spectral theorem tells us that there is something 


like a “basis” for H, with respect to which T is a multi- 
plier. To put this slightly differently, there is an isomet- 
ric isomorphism <p from H to a Hilbert space H' of func- 
tions that are square-integrable with respect to some 
me asure [III. 5 7], and the map <£T<j> -1 is a mul tiplier 
on H' . 

3.5 Projections 

Another important class of maps on a Hilbert space is 
the set of orthogonal projections. In general, an element 
T of an algebra is an idempotent if it has the property 
that T 2 = T. If the algebra is an algebra of operators on 
a space X, then T is called a projection. To see why this 
name is appropriate, note that every x is mapped to the 
subspace TX of X, and all points in that subspace are 
left fixed by T (since T(Tx) = T 2 x = Tx). A projection 
is orthogonal if Tx is always orthogonal to x- Tx. This 
tells us that T is a projection on to some subspace Y of 
H, and that it takes each vector to the nearest point in 
Y, so that the vector x - Tx is orthogonal to the whole 
of the subspace Y. 


III. 5 3 Local and Global in 
Number Theory 

Fernando Q. Gouvéa 


Analogy is a powerful tool. When one can see parallels 
between two different theories, this often allows one to 
transport insights from one to the other. The idea of 
studying something “locally” comes from the theory of 
functions. Imported into number theory by way of an 
analogy between functions and numbers, it leads us to 
a whole new kind of number, the p-adic numbers, and 
to the local-global principle, which has become one of 
the guiding ideas of modern number theory. 

1 Studying Functions Locally 

Suppose that we have a polynomial such as 

/(X) = -18 + 21x - 26x 2 + 22x 3 - 8x 4 + X 5 . 
From the very way the polynomial is written down, we 
can see certain things about it. For example, we can 
see at once that if we plug in x = 0, we get /(O) = 
-18. Other things are less apparent. For example, to 
decide what/(2) or/(3) are, we wouldhave to do some 
arithmetic. But if we were to rewrite the polynomial as 

fix) = 5(x - 2) - 6(x - 2) 2 - 2(x - 2) 3 

+ 2(x — 2) 4 + (x — 2) 5 , 
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we could see at once that /( 2) = 0. (Of course, one 
needs to check that those two expressions really are 
equal!) Similarly, we can check that 

/(x) = 10(x - 3) 2 + 16(x - 3) 3 + 7(x - 3) 4 + (x - 3) 5 
and see at once that /( 3) is also zero, and in faet that 
the polynomial has a double root at x = 3. 

One way to think about this is to describe the first 
expression as “local at x = 0,” because it privileges the 
value 0 over all others. Then the other two expressions 
are local at 2 and local at 3, respectively. On the other 
hånd, a formula like 

f(x) = (x - 2)(x - 3) 2 (x 2 + 1) 

(which is also correct) is clearly more “global.” It tells 
us where all the roots are: at 2, 3, and i:V 1 , with the 
3 being a double root. 

The same ideas extend to funetions that are not poly- 
nomials, as long as we allow the expressions to be 
infinite. So, for example, let us take 


Locally at 0, we can write this as 


g(x) = ~§ + x + ^x 2 - §x 3 - ^x 4 + ^x 5 + ■ ■ ■ . 


Or we can write it locally at 2: 



Notice that this time we had to use a negative power of 
(x - 2), because plugging in x = 2 makes the denomi- 
nator zero. Nevertheless, the expansion tells us that the 
“badness” at 2 is not too bad. Specifically, we can see 
that while g (2) is undefined, (x - 2) g (2) makes sense 
and is equal to — |. 

It is easy to keep going. To handle general funetions 
locally at a, we may sometimes need to use fractional 
powers of (x - a ) , but it does not get mueh worse than 
that. Such expansions are a very powerful tool in the 
theory of funetions. One of the motivations for the dis- 
covery of the p-adic numbers was to find a similarly 
powerful tool for the study of numbers. 


2 Numbers Are like Funetions 

It was dedekind [VI. 50] and Heinrich Weber who first 
realized that an analogy could be drawn between num- 
bers and funetions. In their scheme, positive whole 


numbers were compared to polynomials, while frac- 
tions were analogous to quotients of polynomials such 
as the funetion g(x) above. More complicated fune- 
tions were like more complicated kinds of number. 
elliptic functions [V.34], for example, were similar to 
certain kinds of algebraic number. On the other hånd, 
funetions like sin(x) were more like transcendental 
numbers [HI.43] such as e or tt. 

Dedekind and Weber pushed the idea that “funetions 
are like numbers” in order to understand funetions 
better. In particular, they showed that the techniques 
developed to study algebraic numbers could be used 
to study a whole class of funetions, which came to 
be known as algebraic funetions. It was Kmt Hensel, 
however, who saw that if funetions are like numbers, 
then numbers must be like funetions. In particular, 
he set out to find an analogue, for numbers, of the 
local expansions that were so useful in the theory of 
funetions. 

To get to Hensel’s idea, let us start by noticing that 
the way we usually represent numbers already points in 
the right direction. After all, an expression like 34 291 
really means 

34 291 = 1 + 9 ■ 10 + 2 ■ 10 2 + 4 ■ 10 3 + 3 ■ 10 4 + 3 ■ 10 5 . 
If we allow omselves to think of 10 as being something 
like the variable x, this looks exaetly like a polynomial. 
What is more, just as we can expand a polynomial in 
terms of different expressions (x - a), we can write 
numbers in other bases. For example, 

34291 = 4 + 4 ■ 11 + 8 ■ li 2 + 3 ■ li 3 + 2 ■ li 4 . 

It is easy to see how to find this expansion. First, divide 
34291 by 11, and look at the remainder. It is 4. That 
is our first term. Next, subtract 4 from the original 
number to get something divisible by 11: 

34291 -4 = 34287 = 3117 - 11. 

Now divide 3117 by 11 to find the next remainder, 
which will give the second term. Keep repeating this 
process, and you will find the base- 11 expansion. 

That sounds very promising, but there is one little 
insight missing. The faet is that 10 is not really like 
(x - 2), because 10 can be factored, while (x - 2) can- 
not. So expanding a number in base 10 is a little like try- 
ing to express a polynomial in powers of (x 2 - 3x + 2), 
which factors as (x - 1 ) (x - 2 ) . Such an expansion is not 
really local, since it is looking at two possible values of 
x at once. Similarly, the base-10 expansion mixes infor- 
mation about 2 and information about 5. The upshot is 
that we should always use a prime number as our base. 
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Just to fix ideas, let us choose p = 11. We already 
know that we can write positive numbers in base 11, 
i.e., as “polynomials in powers of 11.” What happens if 
we try it with a fraction? Let us take \ . The first step 
is to find the remainder, that is, to find a number r 
(positive, between 0 and 10) such that \ - r is divisible 
by 11. Well, 2 — 6 = — ■ 11. So the first term is 

6. (To see what is meant by divisibility here, consider 
what would have happened if we had taken r = 4. Then 
2 - r would have been - 5 , and if we divide that by 1 1 
we get - 22 , which has a factor of 1 1 in the denominator. 
It is this that is not allowed and that does not happen 
when r = 6.) 

Now we repeat with the quotient, which was - We 
see that - 5 = ■ 11. So the second term 

will be 5 ■ 11. But now we find ourselves having to do 
- | again! So we will do this again and again, and all 
of the remaining terms will have coefficient 5. In other 
words, 

2 = 0 15-11 i 5 - li 2 i 5 - i P + 5 • li 4 + 5 ■ li 5 + ■ ■ ■ . 
It is not clear quite what the equals sign means here, 
but in any case we have obtained an infinite expansion 
in powers of 11. It is called the 11-adic expansion of 
Furthermore, the expansion “works” when we do arith- 
metic with it. For example, if we multiply it by 2 and do 
all the rearranging (2x6= 12 = 1 + 11, so carry a 1, 
etc.) we do end up with 1. 

Hensel showed that one can do this with all algebraic 
numbers as long as one allows infinite expansions, a 
finite number of negative powers of 11 (so that one 
can handle ^ and similar things), and, in certain cases, 
fractional powers of 11. He argued that we should view 
such expansions as giving information “locally at 11.” 
The same happens with all of the prime numbers. So if 
we have a prime number p we can consider our num- 
bers “locally at p" by taking their expansions in powers 
of p. These we call their p-adic expansions. Just as in 
the case of functions, such expansions immediately tell 
us how divisible by p a number is, while hiding all the 
information about other primes; in that sense, they are 
truly “local.” 

3 p-adic Numbers 

The hest answers always raise new questions. Hav- 
ing discovered that any rational number has a p-adic 
expansion, and that one can “do arithmetic” directly 
with the expansions, it is inevitable to ask whether we 
have therefore enlarged the world of numbers under 
consideration. Once we have chosen the prime p, any 


rational number gives us a p-adic expansion. But does 
every such expansion come from a rational number? 

Not a chance. It is easy to see that the set of all expan- 
sions is much bigger than the set of all rational num- 
bers. Hensel’s next move, then, was to point out that 
the set Q p of all possible p-adic expansions is a new 
realm of numbers, which he called the p-adic numbers. 
It includes not only all the rational numbers, but also a 
lot more. 

The hest way to think of Q p is by analogy with the set 
R of all real numbers. Real numbers are usually given by 
their decimal expansions. When we write e = 2.718 
what we mean is that 

e = 2 + 7 ■ HT 1 + 1 ■ 1(T 2 + 8 ■ 10“ 3 + ■ ■ ■ . 

The set of all such expansions is the set of all real num- 
bers. It contains all the rational numbers, but is much 
bigger. 

Of course, except for the faet that both contain the 
rationals, these two realms are almost completely dif- 
ferent. For example, in both Q p and R there is a nat- 
ural notion of “distance between two numbers.” But 
these distances are completely different, even when the 
numbers in question are rational. So, in the reals, 2 is 
very close to 2001/1000. In the 5-adics, however, the 
distance between these two numbers is quite large! 

It turns out that we can do calculus with p-adic num- 
bers, just as we do it with reals. Many other math- 
ematical ideas also extend. So Hensel’s ideas led to 
a system of “parallel (numerical) universes”— one for 
each prime, plus the real numbers— in which we can do 
mathematics. 

4 The Local-Global Principle 

At first, most mathematicians seem to have found 
Hensel’s new numbers interesting in a formal way, but 
also to have wondered what the point of them was. 
One does not adopt a new number system just for fun; 
it needs to be useful for something. Hensel was fas- 
cinated by his numbers and kept writing about them, 
but to begin with he had trouble demonstrating their 
usefulness. He showed, for example, that they could be 
used to develop the basics of algebraic number theory 
in a new way— but most folks seemed happy with the 
old way. 

One can demonstrate the power of a new idea by 
giving a beautiful and easy proof of a difficult result. 
Hensel wrote a paper purporting to do just that: he 
gave an easy and elegant p-adic proof that the num- 
ber e is transcendental. This did get people’s attention. 
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Unfortunately, when they looked hard at the proof they 
realized that it contained a subtle error. As a result, 
mathematicians’ attitude of suspicion about Hensel’s 
strange new numbers was reinforced. 

The tide was turned by Helmut Hasse. He had been 
studying in Gottingen. At one point, he walked into a 
used bookstore and found a copy of Hensel (1913), a 
book written a few years earlier. Hasse was fascinated, 
and moved to Marburg to study with Hensel. A cou- 
ple of years later, in 1920, he found the idea that was 
to make the p-adic numbers a crucial tool for number 
theorists. 

What Hasse showed was that it was possible to 
answer some questions in number theory by answer- 
ing them “locally.” Here is a (not very important, but 
fairly easy to follow) example. Suppose x is a rational 
number that is a square of some other rational number 
y , so x = y 2 . Since all rational numbers are also p-adic, 
it is true that for every prime number p the number x, 
thought of as a p-adic number, is a square. And simi- 
larly, the real number x is a square. In other words, the 
rational number y is a kind of “global” square root, in 
that it serves as a square root in each local setting. 

So far, so boring. But now reverse the thing. Suppose 
that we know that for every prime number p the num- 
ber x, thought of as a p-adic number, is the square of 
some p -adic number (which may depend on p ), and also 
that x, thought of as a real number, is the square of 
some real number. A priori, these local square roots of 
x could all be different! But it turns out that under these 
assumptions x must be the square of some rational 
number, so that in faet all the local roots must come 
from a “global” root. 

This leads us to think of the rational numbers as 
“global” and of the various <Q P and of R as “local.” 
Then the previous paragraph claims that the property 
of “being a square” is true globally if and only if it is 
true “everywhere locally.” This turns out to be a pow- 
erful and illuminating idea, and it has become known 
as the Hasse principle or the local-global principle. 

Our example, of course, demonstrates the principle 
in its strongest case: solve a problem locally in all cases, 
and you have solved it globally. That is often too mueh 
to hope for. Nevertheless, attacking a problem locally 
and then putting the local pieces together has become 
a fundamental technique in modern number theory. It 
has been used to simplify older proofs, as in class 
field theory [V.30], and also to obtain new results, 
as in Wiles’s proof of fermat's last theorem [V.12]. 
So Hensel was right after all: his new numbers have 


earned their place along with the real numbers in every 

number theorist’s heart. 
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The Logarithmic Function 

See THE EXPONENTIAL AND LOGARITHMIC 
FUNCTIONS [III.25] 


III. 5 4 The Mandelbrot Set 


Suppose we have a complex polynomial / defined by 
a formula /(z) = z 2 + C for some complex number C. 
Then for any choice of complex number zo we can form 
a sequence zo,zi, Z2, . . . by iterating, that is, repeatedly 
applying, the function /. So we let zi = /(zo), Z2 = 
/(zi ) , and so on. Sometimes the resul ting sequence will 
tend to infmity, but sometimes it remains bounded— 
that is, it stays within a fixed distance from 0. For exam- 
ple, if we take C = 2 and start with zo = 1, then 
the sequence goes 1,3,11,123,15131,... and clearly 
tends to infmity, whereas if we start with zq = j(l - 
iV6), then we find that zi = z\ + 2 = zo so the sequence 
is bounded since all its terms are equal to zo- The Julia 
set associated with the constant C is the set of all zo for 
which the sequence remains bounded. Julia sets often 
have a fractal shape (see [IV.15 §2.5]). 

To define a Julia set, one flxes C and considers dif- 
ferent possibilities for zq. What happens if one flxes zo 
and considers different possibilities for C? The result 
is the Mandelbrot set. The precise definition is that it 
is the set of all C such that the sequence is bounded 
if you take zq = 0. (One could consider other values 
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of zo, but the resulting sets are not interestingly differ- 
ent because they are related to each other by a simple 
change of variables.) 

The Mandelbrot set also has an intricate fractal 
shape — one that has captured the popular imagina- 
tion. The detailed geometry of the Mandelbrot set is 
not yet fully understood; some of the resulting open 
problems are of major importance because they encode 
very general information about dynamical systems. See 
Dynamics [IV. 15 §2.8] for more details. 


III. 5 5 Manifolds 


The surface of a sphere has the property that if you look 
at a very small portion of it then that portion will look 
like part of a plane. More generally, a d-dimensional 
manifold, or d-manifold, is a geometrical object that 
looks “locally” like d-dimensional euclidean space 
[1.3 §6.2]. Thus, 2-manifolds are smooth surfaces such 
as those of a sphere or a torus. Higher-dimensional 
manifolds are harder to visualize, but are a major 
topic of research. The basics of manifolds are set out 

in SOME FUNDAMENTAL MATHEMATICAL DEFINITIONS 

[1.3 §§6.9,6.10]. More advanced ideas are discussed in 
DIFFERENTIAL TOPOLOGY [IV.9] and ALGEBRAIC TOPOL- 
ogy [IV. 10]. See also algebraic geometry [IV. 7], mod- 
uli spaces [IV.8], and ricci flow [III.80]. (Even this is 
far from a complete list of articles in which manifolds 
feature.) 


III. 5 6 Matroids 

Dominic Welsh 


The original aim of Hassier Whitney when he intro- 
duced the concept of a matroid in 1935 was to produce 
an abstract notion that would capture the main ingre- 
dients of the structure of a set of vectors in a vector 
space [1.3 §2.3], while avoiding any explicit mention of 
linear independence. 

To do this he singled out two fundamental proper- 
ties and postulated that any family of subsets that pos- 
sessed these properties was the collection of “indepen- 
dent sets” of a “matroid.” The first of these properties 
was an obvious one: any subset of a linearly indepen- 
dent set is also linearly independent. The second prop- 
erty was more subtle: if A and B are two linearly inde- 
pendent sets and B contains more elements than A, 
then there exists some element of B that is not in A 
but which, when added to A, gives a set that is still lin- 
early independent. Finally, in order to avoid trivialities 


he insisted that in every matroid the empty set must be 
independent. 

Thus, formally, a matroid is defined to be a finite set 
E together with a family of subsets of E which are called 
the independent sets and which satisfy the following 
axioms. 

(i) The empty set is independent. 

(h) Every subset of an independent set is independent. 
(iii) If A and B are independent sets, with the number 
of elements of A being one less than the number 
of elements of B, then there is some x in B that is 
not in A such that A u {x\ is also independent. 

Property (iii) is called the exchange axiom. The most 
fundamental example of a matroid is a set of vectors 
in a vector space with the “independent sets” being 
the usual linearly independent ones: in this case the 
exchange axiom is known as Steinitz’s exchange lemma. 
However, there are many examples of matroids that are 
not subsets of vector spaces. 

Here, for example, is an important class of matroids 
that arise from graph theory. A cycle in a graph is a 
collection of edges of the form (vi,V2), (V2,V3), ..., 
(Vk-i, Vk), (Vk,v i), where the v* are distinet vertices. 
Take any graph and call a subset of edges “indepen- 
dent” if it contains no cycle. 

So here we are thinking of a cycle among the edges 
as being in some way similar to a linear dependence 
among some vectors. It is obvious that any subset of 
an independent set will also not contain a cycle, so con- 
dition (ii) is satisfied. Slightly less obvious is that if A 
and B are sets of t and t + 1 edges, respectively, nei- 
ther containing a cycle, then there will be at least one 
edge in B but not in A which can be added to A without 
creating a cycle. So we see that this is another example 
of a matroid, even though it arises in a very different 
context from the vector space one. 

As it turns out, there is a way of identifying the edges 
of a graph with a set of vectors in a vector space over 
the held F 2 of integers mod 2 (see modular arith- 
metic [UI.60]). If G has n vertices and one associates 
with each vertex a basis element of Fj? , then one can 
associate with each edge the vector that is given by the 
sum of the basis elements corresponding to its two end- 
points. A set of edges will then be independent if and 
only if the corresponding vectors in Fj? are linearly inde- 
pendent. However, as we shall see, there are important 
examples of matroids that are not even isomorphic to 
sets of vectors. 
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Figure 1 Two graphs giving rise to the same matroid. 


Note that the coUection of the independent sets (in a 
graph) conveys part of the information present in the 
graph, but by no means all of it. For example, consider 
the graphs G and H in figure 1. As graphs, G and H 
are distinet, but both give the same matroid on the set 
{ a , b, c, d} (the independent sets are all subsets of size 
less than or equal to 3, except for {a, b, c}). Note that 
this matroid is also the same as the matroid formed by 
the columns of the matrix 

a b c d 
/i 0 i i\ 

A= 0 1 1 1 . 

\0 0 0 l) 

However, it turns out that most matroids do not come 
from either graphs or matrices. 

Although a matroid is defined by very simple axioms, 
many basic results from both linear algebra and graph 
theory can be extended to the wider setting of matroids. 
For example, suppose that T is a connected graph. It is 
not hard to prove that if B is a maximal independent 
set of the matroid on G, then B is a tree which is inci- 
dent with every vertex of G. Such a tree is called a span- 
ning tree of G. All spanning trees of a connected graph 
have the same number of edges, namely, one less than 
the number of vertices. Similarly, in a vector space, or 
indeed in any subset of vectors, all maximal linearly 
independent sets have the same size. Both of these are 
special cases of the general result that in any matroid 
all maximal independent sets have the same size. This 
common size is called the rank of the matroid and, by 
analogy with vector spaces, a maximal independent set 
in a matroid is called a base. 

Matroids arise naturally in many parts of mathemat- 
ics, and they often turn up unexpectedly. For example, 
consider the minimum connector problem: a company 
needs to connect a number of cities hy links, such as 
railways or phone cables, and wishes to minimize the 
total cost. This is clearly equivalent to the following 
problem. Given a connected graph G, with each edge e 
having a nonnegative weight w (e), find a set of edges 
that has the minimum total weight but that connects all 



Figure 2 A graph with edge-weights. 


the vertices of G. It is not hard to see that this problem 
reduces to finding a spanning tree of minimum weight. 

For this there is a classical algorithm. It is the sim- 
plest possible algorithm one could imagine for the 
problem, and it works as follows. Start by choosing an 
edge of minimum weight, and at each subsequent step 
add an edge of minimum weight to your chosen set 
provided that at no stage a cycle is formed. 

For example, consider the graph in figure 2. The algo- 
rithm would successively select the edges (a, b), (b, c), 
(d,f), (e,/), (c,d), giving a spanning tree of total 
weight 1 + 2 + 3 +5 + 7-1 8. Because of the way it 
works, the algorithm is known as a greedy algorithm. 

At first sight, it seems rather unlikely that this algo- 
rithm could work, as it denies the possibility that 
choosing a suboptimal edge now might have a payoff 
later. However, it is not hard to show that the algorithm 
is actually correct. hi faet, it extends in almost exaetly 
the same way to matroids in general: what it gives is 
a (rather fast) algorithm for selecting a base of mini- 
mum weight in a matroid in which each element has a 
nonnegative weight. 

Somewhat more surprisingly, matroids are the only 
structures for which the greedy algorithm works. More 
precisely, suppose that 3 is a family of subsets of a set 
E with the property that if A e 3 and B e A, then Bel 
Now let w be any weight funetion and suppose that 
the problem is to select a member B of 3 which has 
maximum weight, where the weight of a set is just the 
sum of the weights of its elements. As ahove, the greedy 
algorithm starts by choosing an element e of maximum 
weight and then successively picks elements of maxi- 
mum weight from the remaining elements subject to 
the proviso that at each stage, the set of elements cho- 
sen is a member of 3. It turns out that the following 
is true: the greedy algorithm works on 3 for all weight 
funetions iv if and only ifl is the collection of indepen- 
dent sets ofa matroid. Thus, matroids form a “natural 
home” for many optimization problems. Moreover, the 
concept is genuinely useful, since many of the matroids 
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that arise in such problems are not derived from either 
vector spaces or graphs. 


III.57 Measures 


To understand measure theory, and to see why it is use- 
ful and important, it is instructive to start with a prob- 
lem about lengths. Suppose that we have a sequence 
of intervals in [0, 1] (the closed interval from 0 to 1), 
of total length less than 1. Can they cover [0,1]? In 
other words, given intervals [ai , bi ] , [«2 , i>2 ] , . . . , with 
X(hn - «n) < 1, is it possible that their union equals 
[0,1]? 

One is tempted to answer “no, as the total length is 
too small.” But this is just to restate the question. After 
all, why should “total length less than 1” actually imply 
that the intervals cannot cover [0, 1]? Another tempt- 
ing answer is to say “just rearrange the intervals so that 
they go from the left to the right, and then we never get 
to the right-hand end of [0, 1].” In other words, if the 
nth interval has length b n -a n = d n , then just translate 

the intervals to be the intervals [0, d\ ], [di , di +d-2 ] , 

In this rearrangement, it is indeed true that we never 
cover any point beyond X d n , and so do not cover 
all of [0, 1], but why does this imply that the original 
intervals do not cover [0, 1]? 

It is quite easy to see that this rearrangement argu- 
ment works for a finite number of intervals, but it does 
not work in general. Indeed, suppose we ask the same 
question, but for the rationals: that is, let us replace 
the interval [0, 1] by the rational interval [0, 1] n Q. If 
our intervals have lengths i, . . . , for example, so 
that the total length is only then certainly the left- 
to-right intervals will cover only the interval [0, | ] n Q, 
but it is possible for the original intervals to cover all 
of [0, 1] n Q, since we can just enumerate the rationals 
as q\,q 2 ,... (see countable and uncountable sets 
[III.ll]), and then put an interval of length \ aroundqi, 
one of length g around q->, and so on. 

This observation shows that the answer to our prob- 
lem must involve properties of the reals that are not 
shared by the rationals— which wrecks any kind of “it 
is obvious” argument. In faet, the result is true for the 
reals, but its proof is a good exercise. 

Why is this an important faet? It stems from a wish 
to define “length” for general sets of reals (for simplic- 
ity, we will concentrate on [0, 1], just to avoid some 
technicalities about “infinite length”). What should the 
“length” of a set be? For intervals the answer is clear, 


and it is also clear for finite unions of intervals. But 
what about sets like { \ , g , \ , . . . } , or Q itself? 

A natural first attempt would be to use finite unions 
of intervals: one could take the length of a set A to be 
the least value of the length of a finite union of inter- 
vals that covers A. More precisely, one could define 
the length of A to be the infimum of (hi - a-i) + 
■ ■ ■ + (b n -a n ), taken over all finite unions of intervals 
[ai.fii] u ■ ■ ■ u [a n ,b n ] that cover A. 

Unfortunately, this definition has some very undesir- 
able properties. For example, the length of the set of all 
rational numbers in the interval [0, 1] would thenbe 1, 
as would the length of all irrational numbers in [0, 1]. 
We would thus have two disjoint sets (and very natural 
ones at that) such that the length of their union is not 
the sum of their lengths. So this form of “length” is not 
really well-behaved for such sets. 

What we want is a notion of length that applies to all 
the sets we know and are used to, and is additive, mean- 
ing that the length of AuB is the sum of the lengths of A 
and B whenever A and B are disjoint. Remarkably, this 
can be achieved, and the key idea is to allow countable 
covers. That is, we modify the above definition as fol- 
lows: the length (or measure, to give it its usual name) of 
aset Ais the infimumof (hi -ai) + (i>2-ci2) + - ■ - , taken 
over all unions of intervals [ai , fii ] u [a2 , i>2 ] u ■ ■ ■ that 
cover A. Note that, thanks to the puzzle discussed ear- 
lier, the measure of the interval [a, fi] is fi - a, just as 
we would hope. 

It is also not hard to see that the measure of the 
set of rationals in [0, 1] is zero, and it turns out that 
the measure of the irrationals in [0, 1] is 1. Indeed, any 
countable set has measure zero. In many contexts, sets 
of measure zero are regarded as “negligible” or “of no 
importance.” It is worth mentioning that there are also 
sets of measure zero that are uncountable (an example 
is the cantor set [III. 17]). 

It turns out that, even with this definition, there are 
pairs of disjoint sets A and B such that the measure 
of A u B is not the sum of the measures of A and B. 
However, it can be shown that for all “reasonable” sets 
the measure is additive. More precisely, one says that a 
subset of [0, 1] is measurable if the measures of it and 
its complement add up to 1, as they should. If A and B 
are disjoint measurable sets, then the measure of their 
union is the sum of their measures. 

This is a very useful faet, since it can be shown that 
every set that arises naturally in mathematics, or that 
has an explicit definition, is measurable: intervals, finite 
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unions of intervals, countable unions of intervals, Can- 
tor sets, things involving rationals or irrationals, and so 
on. In faet, the union of any countable family of measur- 
able sets is again measurable (one says that the measur- 
able sets form a sigma-algebra). Even better, for mea- 
surable sets the measure is countably additive, mean- 
ing that the measure of a disjoint union of countably 
many measurable sets is the sum of the measures of 
the individual sets. 

More generally, in many other settings, one wants to 
end up with a sigma-algebra, containing all the sets 
one is interested in, on which we can define a count- 
ably additive measure, or “length funetion.” The above 
example is called Lebesgue measure on [0,1]. In gen- 
eral, whenever one wishes to define a countably addi- 
tive measure, one always needs a result like the puzzle 
above in order to get started. 

Here is another example: we could work in [0, l] 2 
(the unit square in the plane), and base our ideas upon 
rectangles instead of intervals. So we would define the 
measure of a set as the least total area of a sequence of 
rectangles that covers the set. This gives an elegant and 
powerful approach to integration: the integral of a fune- 
tion / (defined on [0,1], say, and taking values in [0, 1 ]) 
is just defined to be the “area under its graph”: that 
is, the measure of the set {( x,y ) : y < /(x)}. Many 
complicated-looking funetions can now be integrated: 
for example, the funetion / that is 1 on the rationals 
and 0 on the irrationals is easily checked to have an 
integral, namely 0, whereas in earlier theories such as 
Riemann integration that funetion would be too rapidly 
varying to be integrable. 

This approach to integration gives rise to the so- 
called Lebesgue integral (further discussed in the arti- 
cle on lebesgue [VI.72]), which is one of the funda- 
mental concepts in mathematics. It allows one to inte- 
grate a wide range of funetions that are not Riemann 
integrable, but the main reason for its importance is 
not so mueh this as the faet that the Lebesgue inte- 
gral has very good limi ting properties that the Riemann 
integral lacks. For example, if fi,fz, ■ ■ ■ is a sequence 
of Lebesgue-integrable funetions from [0,1] to [0,1] 
and f n (x) converges to f(x) for every x, then / is 
Lebesgue-integrable and the Lebesgue integrals of the 
funetions f n converge to the Lebesgue integral of /. 


III. 5 8 Metric Spaces 


There are many contexts in mathematics, especially 
in analysis, where one would like to say that two 


mathematical objects are close, and understand pre- 
cisely what that means. If the two objects are the 
points (xi,X2) and (y\,yz) in a plane, then the 
t ask is straightforward: t he distance between them is 
■sjiyi - xi) 2 + (yz - X2) 2 , by the Pythagorean theorem, 
and it makes sense to say that the points are close if this 
distance is small. 

Now suppose that we have two points in n-di- 
mensional space, (xi,...,x ra ) and (yi,...,y„). It is 
a simple matter to generalize the formula just given 
when n = 2 and define the distance between them to 
be 

Vor - xi) 2 + (y 2 -x 2 ) 2 + ■ ■ ■ + (y n -x n ) 2 . 

Of course, the faet that the formula can be easily gen- 
eralized is not in itself a guarantee that the resulting 
notion is a sensible definition of distance. And this 
raises the question of what properties we would like 
a definition to have for it to count as sensible. A metric 
space is an abstract notion that results from thinking 
about this question. 

Let X be a set of “points.” Suppose that, given any two 
of these points, x and y say, we have a way of assigning 
a real number d(x,y) that we wish to regard as the 
distance between them. The following three properties 
are ones that it would be highly desirable for this idea 
of distance to have. 

(PI) d(x,y) > 0 with equality if and only if x = y . 

(P2) d(x,y) = d(y,x ) for any two points x and y. 
(P3) d(x,y) +d{y,z) ^ d(x,z) for any three points x, 
y, and z. 

The first of these properties says that the distance 
between two points is always positive, except when the 
two points are the same, when it is zero. The second 
says that distance is a symmetric notion: the distance 
from x to y is the same as the distance from y to x. 
The third is called the triangle inequality. if you imagine 
x, y, and z as the vertices of a triangle, it says that the 
length of any side never exceeds the sum of the lengths 
of the other two sides. 

A funetion d defined on pairs of points (x,y) from 
a set X is called a metric if it has properties (P1)-(P3) 
above. In that case, X and d together form a metric 
space. This abstraction of the usual notion of distance 
is very useful, and there are many important examples 
of metrics that are not necessarily derived from the 
Pythagorean theorem. Here are a few examples. 

(i) Let X be n-dimensional space, that is, the set M n 
of all sequences (xi,...,x re ) of n real numbers. 
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It can be shown that the formula derived above 
from the Pythagorean theorem gives a notion of 
distance that does indeed satisfy properties (Pl)- 
(P3). This metric is called the Euclidean distance 
and the resulting metric space is called Euclid- 
ean space. Euclidean spaces are perhaps the single 
most basic and important class of metric spaces in 
mathematics. 

(ii) Information is often transmitted digitally in 
the form of a string of Os and ls, such as 
000111010010. The Hamming distance between 
two such strings is defined to be the number of 
places where the strings are different. For exam- 
ple, the Hamming distance between the strings 
00110100 and 00100101 is 2, since the strings dif- 
fer in the fourth and eighth places only. This idea 
of distance also satisfies properties (P1)-(P3). 

(iii) If you are driving from one town to another, then 
the distance you care about is not the distance as 
the crow flies but the length of the shortest route 
along the network of available roads. Similarly, if 
you wish to travel from London to Sydney, then 
what matters is the length of the shortest path 
(known as a geodesic) along the Earth’s surface, 
rather than the “actual” distance through the Earth 
itself. Many useful metrics come from this gen- 
eral idea of a shortest route, which guarantees that 
property (P3) will hold. 

(iv) An important feature of Euclidean distance is 
its rotational symmetry: in other words, rotating 
the plane, or space, does not alter the Euclid- 
ean distances between points. There are other 
metrics that also have a great deal of symme- 
try, and these have great geometrical significance. 
In particular, the discovery of the hyperbolic 
metric [1.3 §§6.6,6.10] in the early nineteenth 
century demonstrated that the parallel postulate 
could not be proved using Euclid’s other axioms. 
This resolved a question that had been open 
for thousands of years. See riemannian metrics 
[1.3 §6.10]. 


III.59 Models of Set Theory 


A model of set theory is, roughly speaking, a struc- 
ture in which the usual axioms of set theory (ZF, or 
ZFC) hold. To explain what this means, let us think first 
about groups. The axioms of group theory mention cer- 
tain operations (such as multiplication and inversion), 


and a model of group theory is a set, equipped with 
such operations, such that the axioms hold. In other 
words, a model of group theory is nothing other than a 
group. So what does a “model of ZF” mean? The axioms 
of ZF mention one relation, namely “is an element of,” 
or “g.” A model of ZF is a set M, on which there is a 
relation E, such that all the axioms of ZF hold in S if we 
replace “g” by “E.” 

However, there is one very important difference 
between these two sorts of model. When one first meets 
groups, one starts with some very simple examples, 
such as cyclic groups, or groups of symmetries of regu- 
lar polygons, and one then builds up to more sophisti- 
cated examples such as the symmetric and alternat- 
ing groups [III. 70], and beyond. But this gentie pro- 
cess is not available for models of ZF. Indeed, since all 
of mathematics can be formulated in the language of 
ZF, it follows that every model of ZF has to contain a 
“copy” of the whole world of mathematics. This makes 
studying models of ZF rather difficult. 

One aspect that is often found puzzling is the faet 
that a model of ZF is a set. This might seem to mean 
that there is a “universal” set (a set that has every set 
as a member), but from russell’s Paradox [II.7 §2.1] it 
is easy to see that there can be no such set. The answer 
to this apparent problem is that the model M is indeed 
a set in the real mathematical universe, but that inside 
the model there is no universal set — in other words, 
there is no element x of M such that yEx for every ele- 
ment y of M. Thus, from the perspective of the model, 
the statement “there is no universal set” is true. 

See model theory [IV.2] for more about models in 
general, and set theory [IV. 1 ] for more about models 
of set theory. 


III.60 Modular Arithmetic 

Ben Green 


Is there a square number whose decimal expansion 
ends ... 7? Is 438 345 divisible by 9? For which posi- 
tive integers n is n 2 - 5 a power of two? Is n 7 - 77 ever 
a Fibonacci number? 

These questions, and more, can be answered using 
modular arithmetic. Let us look at the first question. 
Listing the first few squares, 1, 4, 9, 16, ... , one does not 
find any whose final digit is 7. In faet, writing down just 
the final digits, one gets the sequence 

1,4, 9, 6,5, 6, 9, 4, 1,0, 1,4, 9, 6, 5, 6,..., 
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which seems to repeat (and thus never contain the 
number 7). 

An explanation of this phenomenon is as follows. Let 
nbea number to be squared. We can always write n as 
a multiple of 10 plus a remainder; that is, n = 10 q + r, 
where r e {0, 1, . . . , 9}. Now, if we square n we get 
n 2 = (10 q + r) 2 

= 100 q 2 + 20 qr + r 2 
= 10(10q 2 + 2 r)+r 2 . 

The only part of this expression that affects the 
final digit is the r 2 , which immediately explains why 
the sequence of last digits of squares repeats with 
period 10, and hence contains no 7s. 

Modular arithmetic is essentially just a notation for 
writing down arguments of this sort. If two numbers 
(like n and r) leave the same remainder on division 
by 10, then we say that they are congruent modulo 10 
and write n = r mod 10. What we proved above is the 
statement that, if n = r mod 10, then n 2 = r 2 mod 10. 

Everything we have just said applies equally well if 
we replace 10 by an arbitrary modulus m: if n and r 
leave the same remainder on division by m, then we 
say that n and r are congruent modulo m and we write 
n = r mod m. Equivalently, n and r are congruent 
modulo mif m divides n - r. (An integer a is said to 
divide another integer b if b is an integer multiple of 
a.) The above argument is just one instance of the fol- 
lowing general faet, which is not hard to prove: if a = 
a' mod m and b = b' mod m, then ab = a'b' mod m 
and a + b = a! + b' mod m. 

Now observe that 10=1 mod 9. It follows that 10 x 
10- Ix l- 1 mod 9, and in faet that 10 d = 1 mod 9 
for any d e N. Suppose that we have a number N whose 
decimal expansion is aaaa - 1 ■ ■ ■ a->a-\ do- This means 
that 

N = aa 10 d + aa- il0 d_1 + ■ ■ ■ + ailO + ao- 
Applying the rules of modular arithmetic, we get 
N = aa + ■ ■ ■ + aa- 1 + ■■■+ a\+ ao mod 9. 

This gives the well-known test for divisibiUty by 9: sim- 
ply add up the digits of the number in base 10, and 
see if the result is divisible by 9. For the example N = 
438 345 the sum of the digits is 27, which is divisible 
by 9. So N is a multiple of 9 (in faet N = 9 x 48 905). 

If m is a modulus and n is an integer, then there is 
precisely one value of r between 0 and m - 1 such that 
n = r mod m. This number r is often calied the least 
residue or simply the residue of n to the modulus m. 


Now let us consider the third question posed at the 
beginning of this article, namely the matter of when 
n 2 - 5 is a power of two. When n = 3, 3 2 - 5 = 4 is 
a power of two, but a little experimentation does not 
reveal any further examples. What aspect of the prob- 
lem changes as n becomes larger than 3? The key obser- 
vation is that n 2 - 5 is now greater than 4, and so if it 
were a power of 2, then it would have to be divisible 
by 8. That would mean that ni fe;, 5 mod 8, but this is 
never the case. Indeed, the residues of the first eight 
squares are 1, 4, 1, 0, 1, 4, 1, 0, and we know that the 
sequence will repeat with period 8 (actually, it repeats 
with period 4). Thus it never contains a 5. 

Modular arithmetic should be used with care. 
Although the rules for addition and subtraction are 
simple, division is somewhat more subtle. For exam- 
ple, if one has an equation ac = bc mod m, it is not, in 
general, permissible to divide by c and conclude that 
a = b mod m\ consider, for instance, the case a = 2, 
b = 4, c = 3, m = 6. 

Let us examine what has just gone wrong. To say 
that ac = bc mod m means that m divides ac - bc = 
(a-b)xc. But this clearly does not mean that m divides 
a - b, since m could divide c (or at least have a com- 
mon factor with it). However, if m has no factor in com- 
mon with c, then it must divide a-b, so in this case 
we do indeed have a = b mod m. In particular, for 
any prime number p we have the very useful cance- 
lation law. if ac = bc mod p and c # 0 modp, then 
a = b modp. 

The examples so far may have suggested that the 
principal uses of modular arithmetic are to do with spe- 
cific moduli such as 10 and 8. However, this is far from 
true, and the subject really comes into its own when 
one looks at more general m. For example, one of the 
basic results in number theory is Fermat’s little theo- 
rem, which States that if p is a prime and a é 0 mod p, 
then a p_1 s f modp. Let us quickly prove this. Con- 
sider the numbers a, 2a, 3a, . . . , (p - l)a modp. If 
ra = sa mod p, then from the cancelation law we can 
deduce that r = s modp, from which it follows that 
a, 2a, . . . , (p - l)a are all different modulo p. Further- 
more, none of these numbers is 0 modp. We are thus 
forced to conclude that the numbers a, 2a, 3a, . . . , (p - 
l)a modp are simply a rearrangement of the num- 
bers 1 , 2, 3, . . . , p 1 mod p. In particular, the products 
of the numbers in these two sets are the same, which 
implies that 

a p_1 (p - 1)! = (p - 1)! modp. 
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Since (p - 1)! is not a multiple of p, we can apply the 
cancelation law again and divide both sides by (p - 1)!. 
This implies the result. 

Euler's theorem is a generalization of Fermat’s lit- 
tle theorem to composite moduli. It States that if m 
is a positive integer and a is another positive integer 
that is coprime to m (this means that a and m have 
no common factor), then s 1 modm. Here 4> 

is Euler’s totient function: <fr(m) is the number of inte- 
gers less than m that are coprime to m. For distance, 
if m = 9, then the integers less than m and coprime 
to m are 1, 2, 4, 5, 7, and 8, so <£( 9) = 6 and we can 
deduce from Euler’s theorem that 5 6 = 1 mod 9. Let us 
check this directly: 5 6 = 1 5 625, so the sum of its digits 
is 19, whichis indeed congruent to 1 mod9. For further 
discussion of the Fermat-Euler theorem, see mathe- 
MATICS AND CRYPTOGRAPHY [VII. 7], COMPUTATIONAL 
NUMBER THEORY [IV. 5], and THE WEIL CONJECTURES 
[V.38]. 

The final question from above— whether n 7 - 77 is 
ever a Fibonacci number— is left as an exercise to the 
reader. 


III.61 Modular Forms 

Kevin Buzzard 


1 A Lattice in the Complex Numbers 



When one first learns about the complex numbers, one 
is taught to think of them as a two-dimensional space, 
with one real and one imaginary dimension: a complex 
number z = x + iy has real part x and imaginary part 
y, where i is a square root of -1. 

Now let us consider what the complex numbers that 
have integers for their real and imaginary parts look 
like. These complex numbers, such as 3 + 4i or — 23i, 
form a “lattice” in the complex plane (see figure 1). 

By definition, every element of this lattice is of the 
form m + ni for some pair of integers m and n. We 
say that the lattice is generated by 1 and i, and use 
the notation z + Zi for it. Note that this lattice can be 
generated in plenty of other ways. For example, it is also 
generated by the pair (1, -i), the pair (1, 100+i) oreven 
the pair (101 + i, 100 + i). In faet, one can easily check 
that this lattice is generated by the pair (a + bi, c + di) 
(meaning that every element of the lattice is an integer 
combination of a + bi and c + di) if and only if a, b, c, 
and d are integers and ad - bc = ±1. 


2 More General Lattices 

Now let v and w be any two complex numbers and 
consider the set of complex numbers of the form av + 
bw, again with a and b integers (see figure 2). 

A lattice is exaetly such a thing: a grid Zv + Zw in the 
complex plane generated by two complex numbers v 
and w, with the provisos that neither v nor w is zero 
and that v /w is not real (this is just to ensure that v 
and w do not both lie on one line). 

If t = x + iy is a complex number with y * 0, then 
there is a standard lattice associated with t, namely 
Zt+Z. We call this lattice A T and note that A r = A_ T . In 
general, however, distinet complex numbers t give rise 
to distinet lattices — and furthermore there are plenty 
of lattices that are not equal to A T for any t, for the 
simple reason that 1 belongs to A T for every t. 

3 Relations between Lattices 

If A is a lattice generated by v and w , and a is a nonzero 
complex number, then one can multiply the entire situ- 
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ation by « and deduce that «A is the lattice generated 
by av and aw. Geometrically, this says that one can 
rotate and rescale lattices. 

If A is a lattice generated by v and iv, and we scale 
it by dividing everything by iv, then we get a new lat- 
tice (l/w)A, which is generated by v /w and w/w = 1. 
In particular, this new lattice is equal to A T for the 
complex number t = v /w. 

It may seem like an odd thing to do, but one can apply 
this scaling trick to A r itself . The lattice A T is generated 
by (t, 1) but also by any pair (v,w) = (aT + b,CT + d), 
if a, b, c, and d are integers such that ad-bc = ±1. 
If we divide by ct + d and set a = (av + b)/(CT + d), 
then we see that 

-^—A T =A (T . (1) 

ct + d 

4 Modular Forms as Functions on Lattices 

The formal definition of a modular form is rather unen- 
lightening: it is a function that obeys certain bound- 
edness conditions and transformation properties. One 
way of seeing where the transformation properties 
come from is to think about lattices. If k is an integer, 
then a modular form of weight k is a function / that 
associates a complex number f(A) with any lattice A, 
and has the property that 

f(aA) = a~ k f(A). (2) 

The function also has to satisfy some other properties 
(a differentiability condition and a boundedness condi- 
tion), but the crucial property is the one above. If k is 
even and at least 4, then an example of a modular form 
of weight k is the Eisenstein series Gk defined by the 
formula 

G k (A)= X A~ k . 

The assumption that k is at least 4 guarantees that the 
sum converges, and the evenness of k ensures that the 
function is nonzero. 

We have seen that any lattice can be scaled so that 
it takes the form A T for some t, so (2) implies that a 
modular form will be determined by its values on such 
lattices. If Tf denotes the complex numbers with pos- 
itive imaginary part, then, because A T = A_ T , a modu- 
lår form is in faet determined by its values on A T for 
t e Tf. 

However, an arbitrary function on Tf does not give 
us a modular form: equation (1) tells us that if / is a 
modular form and F is the function on Tf defined by 


F(t) = /(A t ), then F must satisfy the equation 

^(ctTt) = VT + d) kF (T) (3) 

for every a,b,c,d e Z such that ad - bc = 1. (The 
reason we exelude the case ad-bc = - 1 is that (ar + 
b) / (ct + d) would not be in the upper half-plane in this 
case.) This is the equation at the heart of the definition 
of a modular form. 

Over the years, mathematicians have isolated other 
desirable properties that F should have in order to give 
a useful theory. Nowadays, modular forms are required 
to obey the additional properties that F is holomor- 
phic [1.3 §5.6] and that F(x + i y) does not grow too 
quickly as y goes to +oo; these assumptions imply that 
the vector space of weight k modular forms is finite 
dimensional. The Eisenstein series above do have these 
additional properties, and are the first basic examples 
of modular forms. 

5 Why Modular Forms? 

Modular forms have links with arithmetic, geom- 
etry, representation theory, and even physics. Modular 
forms also played a key role in the Taylor-Wiles proof 
of fermat’s last theorem [V.12]. Why is this? One 
general reason is that there are links between modular 
forms and other mathematical objects: here we briefly 
explain one of the links. 

Lattices in the complex plane are related to elliptic 
curves pH.21]: the quotient of the complex numbers 
by a lattice is an elliptic curve, and every elliptic curve 
arises in this way. Hence to study elliptic curves, or fam- 
ilies of elliptic curves, one can instead study families 
of lattices. One way of studying an object is by study- 
ing the functions on that object, and a modular form 
is precisely that: a function on the collection of all lat- 
tices. And indeed, automorphic forms, which are gen- 
eralizations of modular forms, have been used to great 
effeet in studying a wide variety of families of algebraic 
objects in this way. 


III.62 Moduli Spaces 


An important general problem in mathematics is clas- 
sification (see the general goals of mathematical 
research [1.4 §2]). Often, one has a set of mathematical 
structures and a notion of equivalence, and one would 
like to describe the equivalence classes [1.2 §2.3]. 
For example, two (compact, orientable) surfaces are 
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often regarded as equivalent if each can be continu- 
ously deformed into the other. Each equivalence class 
is then fully describedby the genus [III.33], or “number 
of holes,” in the surface. 

Topological equivalence is rather “crude,” in the 
sense that it is relatively easy for two surfaces to be 
equivalent. As a result, the equivalence classes are 
parametrized by a fairly simple set: the set of all posi- 
tive integers. But there are many geometrical contexts 
in which finer notions of equivalence are important. 
For example, in several contexts one wishes to regard 
two two-dimensional lattices [III.61] as equivalent if 
one is a rotation and enlargement of the other. Equiv- 
alence relations such as this one often lead to param- 
eter sets that themselves have an interesting geomet- 
rical structure. Such sets are called moduli spaces. For 
details, see [IV.8] and also [V.26]. 


III.63 The Monster Group 


THE CLASSIFICATION OF FINITE SIMPLE GROUPS [V.8] is 
one of the landmarks of twentieth-century mathemat- 
ics. As its name suggests, it gives a complete descrip- 
tion of all finite simple groups, which can be thought of 
as the budding blocks for all finite groups. It States that 
each finite simple group belongs to one of eighteen 
infinite families, or else is one of twenty-six “sporadic” 
examples. The Monster group is the largest of the spo- 
radic simple groups, and has 808 017424 794 512 875 
886 459 904 961 710 75 7 005 754 368 000 000 000 
elements. 

As well as having a starring role in the classification 
theorem, the Monster group has remarkable and deep 
connections with other areas of mathematics. Most 
notably, the smadest dimension of a faithful represen- 
tation [IV. 12] of the Monster group is 196 883, while 
the coefficient of e 2mz in the important and famous 
“j-function” (see algebraic numbers [IV.3 §8]) is 196 
884. Far from being an amusing coincidence, the faet 
that these two numbers differ by just 1 is a manifesta- 
tion of a very deep connection between the two. See 
VERTEX operator algebras [IV.13 §4.2] for further 
detads. 


The Navier-Stokes Equation 

See THE EULER AND NAVIER-STOKES 
EQUATIONS [III. 2 3] 


III.64 Normed Spaces and Banach 
Spaces 


It is often useful to approximate a funetion / by a 
polynomial P. For example, if you are designing a 
pocket calculator and want it to calculate logarithms 
[III.25 §4], you cannot expect it to do so exaetly, since 
a calculator cannot handle infinitely many digits, so 
instead you wdl get it to calculate a different fune- 
tion P(x) that approximates log(x) well. Polynomials 
are a good choice, because they can be budt up from 
the basic operations of addition and multiplication. 
This idea raises two questions: which funetions can 
you hope to approximate, and what counts as a good 
approximation? 

Clearly, the answer to the second question deter- 
mines the answer to the first, but there is no single right 
answer to the second: it is up to you what you would like 
to declare to be a good approximation. However, not 
all decisions are equally natural. Suppose that P and Q 
are polynomials, / and g are more general funetions, 
and x is a real number. If P{x) is close to fix) and 
Q(x) is close to g(x), then P(x) + Q(x) wdl be close 
to f(x) + g(x). Also, if A is a real number and P(x ) 
is close enough to /(%), then A P(x) wdl be close to 
A f(x). This informal argument suggests that the fune- 
tions that we can approximate well wdl form a vector 
SPACE [1.3 §2.3]. 

We have arrived, by one of many possible routes, at 
the fodowing general situation: we are given a vector 
space V (consisting, in our case, of certain funetions) 
and we would like to be able to say, in a precise way, 
what it is for two elements of the vector space to be 
close. 

The notion of closeness is captured by metric 
spaces [III.58], so the obvious approach is to define a 
metric d on the space V. Now a general principle, when 
putting two structures together (in this case, the linear 
structure of the vector space and the distance struc- 
ture of the metric), is that the two structures should 
relate to one another in a natural way. In our case, 
there are two natural properties that one should ask 
for. The first is translation invariance. If u and v are 
two vectors and we translate them by adding w to 
both, then their distance should not change: that is, 
d(u + w, v + w) = d(u,v). The second is that the met- 
ric should scale correctly. For example, if one doubles 
two vectors u and v, then the distance between them 
should double. More generally, if one multiplies u and 
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v by a scalar A, then the distance between them should 
multiplyby |A|: that is, d(Au,Av) = |A| d(u,v). 

If a metric has the first of these properties, then, set- 
ting w = -u, wefind that d(u,v) = d(0,v -w).Itfol- 
lows that if we know distances from 0, then we know 
all distances. Let us write ||v || instead of d(0,v). Then 
whatwe have just shown is that d(u,v) = ||v -u||. The 
expression ||-|| is called a norm, and ||v|| is the norm 
of v. The following two properties of norms are easy 
to deduce from the faet that d is a metric that scales 
properly. 

(i) For any vector v, ||v|| > 0. Moreover, | v || = 0 only 
if v = 0. 

(ii) For any vector v and any scalar A, ||Au|| = |A|||v||. 
We also have the so-called triangle inequality. 

(iii) ||tt + v || < ||u|| + || v|| for any two vectors u and v. 

This follows from translation invariance and the trian- 
gle inequality for metric spaces, since 

Hu + v|| = d(0,u + v) < d(0,u) + d(u,u + v) 

= d(0,u) + d(0,v) = ||tt|| + 1| v II . 

In general, any funetion || ■ || on a vector space V that 
has properties (i)-(iii) is called a norm on V. A vector 
space with a norm on it is called a normed space. Given 
a normed space V, we can say that two vectors u and 
v are close if their distance \\v - u|| is small. 

There are many important examples of normed 
spaces, several of which are discussed elsewhere in this 
volume. One class of examples that stands out is that 
of hilbert spaces [III.37], which can be thought of as 
norms given by distances that stay the same not just 
when you translate but also when you rotate. Other 
examples are discussed in function spaces [III. 2 9]. 

Let us return to the problem of how to discuss 
approximation by polynomials. The most commonly 
given answers to the two questions that arose earlier 
are as follows. The funetions that one can approxi- 
mate well are all continuous funetions defined on some 
closed interval [ a , b\ of real numbers. These funetions 
form a vector space which is denoted C[a, b\. To make 
the notion of good approximation precise, we introduce 
a norm on this space: ||/|| is defined to be the largest 
value of |/(x)| for any x in the interval (that is, for 
any x between a and b). With this definition, the dis- 
tance ||/ - g || between two funetions / and g will be 
small if and only if |/(x) -g(x)\ is small for every x in 
the interval. In this situation one says that / uniformly 


approximates g. It is not obvious that every continu- 
ous function on [a, b] can be uniformly approximated 
by a polynomial: the statement that it can is called the 
Weierstrass approximation theorem. 

Here is a different way in which normed spaces arise. 
For most PARTIAL DIFFERENTIAL EQUATIONS [1.3 §5.4] it 
is not possible to write down a tidy formula that solves 
them. However, there are many techniques for prov- 
ing that solutions exist, and they usually involve limit- 
ing arguments. For example, sometimes one can gen- 
erate a sequence of funetions f\ , fi , . . . and show that 
these funetions “converge” to some “limiting function” 
/, which, owing to the way we constructed the sequence 
/1./2, ■ ■ ■ , must be a solution to the equation. Again, if 
we want to make sense of this, we must know what it 
is for two funetions to be close, which means that the 
funetions f n should belong to a normed space. 

How can we show that these funetions converge to a 
limit / if we cannot already describe /? The answer is 
that most interesting normed spaces, including Hilbert 
spaces and most important function spaces, have an 
additional property, called completeness, which guar- 
antees, under certain conditions, that limits do indeed 
exist. Informally, it says that if the vectors in a sequence 
vt,v 2 ,... all get very close to each other when you go 
far enough along the sequence, then they must con- 
verge to a limit, v, that belongs to the normed space as 
well. A complete normed space is known as a Banach 
space, after the Polish mathematician stefan banach 
[VI.84], who developed mueh of the general theory of 
such spaces. Banach spaces have many useful proper- 
ties that normed spaces do not have in general: the 
completeness property can be thought of as ruling out 
pathological examples. 

The theory of Banach spaces is sometimes known 
as linear analysis, since by mixing vector spaces and 
metric spaces it mixes linear algebra and analysis. 
Banach spaces arise throughout modern analysis: see, 
for example, the articles in this volume on partial 
DIFFERENTIAL EQUATIONS [IV. 16], HARMONIC ANALYSIS 
[IV. 18], and operator algebras [IV. 19]. 


III.65 Number Fields 

Ben Green 


A number field K is a “finite-degree held extension” of 
Q, the held of rational numbers. This means that K is 
a field [1.3 §2.2] that is finite dimensional when one 
regards it as a vector space [1.3 §2.3] over Q. The fol- 
lowing alternative description is somewhat more con- 
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crete. Take fmitely many algebraic numbers ai, .... at 
(that is, roots of polynomials with integer coefficients) 
and consider the field K of all rational functions in 
the (In other words, K consists of numbers like 
af <X3 /(a| + 7).) Then it turns out that K is a number 
held (the one thing that is not completely obvious is 
that it has finite degree over Q), which we denote by 
Q(ai, . . . , at). Conversely, every number held is of this 
form. 

The simplest number fields are perhaps the quad- 
ratic fields. These are fields of the form Q(Vd) = {a + 
b-Jd : a, b e Q}, where d is an integer (which, it is 
important to stress, may be negative) that is square- 
free. This last condition tells us that d has no nontriv- 
ial square factors. It is there for convenience so that 
all the Q(Vd ) will be distinct. (For example, Q(y'12), 
if we were to allow it, would equal Q(^/3), since vTH = 
2^.) Among the other important number fields are the 
cyclotomic fields. Here we take a primitive mth root of 
unity (which, for concreteness, one could take to be 
e 2 m/m) anc j “adjoin” it to C, obtaining the held Q(£ m ). 

Why consider number fields? Historically, an impor- 
tant reason is that they allow us to factorize certain 
Diophantine equations. For example, the Ramanujan- 
Nagell equation x 2 = 2™ - 7 may be factorized as 
(X + éS7)(x-J-$) =2 n 

if we allow coefficients in the held Q iV-7), while the 
Fermat equation x n + y n = z n is equivalent to 

x n = iz-y)(z-Z n y)---(z-^~ 1 y) U) 

if we allow coefficients in the held Q(£ n ). 

Before one can start thinking about whether such fac- 
torizations are useful, it is necessary to understand the 
notion of integer in a number held K. A number a e K 
is an (algebraic) integer if it is a root of a monic poly- 
nomial with coefficients in Z: that is, a polynomial with 
leading coefficient 1. For simple fields like O(Vd) with 
d squarefree, the integers canbe described quite explic- 
itly. They are all the numbers of the form a + bVd for 
integers a and b, unless d s 1 (mod 4), in which case 
we must include more numbers: we get all numbers of 
the form a + b ( \ ( 1 + 4d) ) , again for integers a and b. 
The set of integers in K is often denoted by Ok, and it 
forms a ring [III.83 §1]. 

Unfortunately, factorizations such as (1) are not as 
helpful as they seem at first sight: Ok turns out not 
to be OK, at least if one expects familiar properties 
of the ring Z to carry over unchanged. In particular, 
unique factorization into primes fails to hold: for exam- 
ple, 2-3 = (l + V^5)(l-v^5) m the held (QK^/-5). The 


numbers on both sides are integers in this held, and it 
is not possible to decompose any of them any further. 

Amazingly, unique factorization may be restored 
by embedding Ok into a larger set, which consists 
of objects called ideals [III.83 §2]. There is a natural 
equivalence relation [1.2 §2.3] that one can place on 
these ideals, and the number of equivalence classes, 
called the class number and written h(K), is one of 
the most important invariants in number theory: in a 
certain sense, it measures “the extern to which unique 
factorization fails” in the number held K. (See alge- 
braic numbers [IV.3 §7] for more details.) The faet that 
it is finite is one of the two basic finiteness theorems in 
algebraic number theory. 

When h(K) = 1, the integers Ok themselves enjoy 
unique factorization, without the need for extra ide- 
als. This does not happen particularly often; among the 
fields Q(V—d) with d positive and squarefree, only nine 
have this property, namely d = 1, 2, 3, 7, 11, 19, 43, 
67, and 163. The problem of determining these num- 
bers was posed by gauss [VI.26] and finally solved by 
Heegner in 1952. 

The faet that h(Q(V-163)) = 1 is closely related to 
some remarkable facts. For example, the polynomial 
x 2 + x + 41 takes prime values when % = 0, 1, . . . , 39 
(observe that 4x41= 163 + 1), and the number e 77,31 63 
is within 10 -12 of an integer. 

It is a well-known open problem to decide whether or 
not there are infinitely many fields Q(Vd), d > 0, with 
class number 1. Gauss and many subsequent authors 
have conjectured that there are. 

The second basic finiteness result in algebraic num- 
ber theory is Dirichlet’s unit theorem. A unit is sim- 
ply some x e Ok such that there exists y e Ok with 
xy = 1. The numbers 1 and -1 are always units, but 
there can certainly be others: for example, 17 - Y2-J2 
is a unit in Q(^/2) (since its reciprocal is 17 + 1 2y'2). 
The units form an Abelian group T Lk under multipli- 
cation. Dirichlet’s theorem States that this group has 
finite rank, which means that it is generated by fmitely 
many of its elements. 

If d > 0 is squarefree and if K = Q(y'd), then 'Uk 
has rank 1. When d =t= 1 (mod 4), the faet that it has 
rank at least 1 is equivalent to the statement that the 
Pell equation x 2 - dy 2 = 1 always has a nontrivial 
solution. This is because the Pell equation factors as 
(x-yVd)(x+yVd) = 1. The unit 17- 12^/2 in Q(^) 
corresponds to the solution x = 1 7, y = 12 of the 
equation x 2 - 2 y 2 = 1. 
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For more about some of the topics discussed in this 
article, see fermat’s last theorem [V.12]. 


III.66 Optimization and Lagrange 
Multipliers 

Keith Ball 


1 Optimization 

Soon after being introduced to calculus, most students 
learn of its application to optimization: that is, to the 
problem of finding the largest or smallest value of a 
given differentiable function, which is usually referred 
to as the objective function. A very helpful observation 
is that if / is an objective function that is maximized 
or minimized at x, then the tangent to the graph at 
the point ( x,/(x )) will be horizontal, since otherwise 
we can find some value x' close to x for which /(x') is 
higher. This means that we can narrow down the search 
for the maximum and minimum values of / by looking 
just at the values of f(x ) for which /'(x) = 0. 

Now suppose that we have an objective function 
of more than one variable, such as, for example, the 
function 

F(x,y) = 2x + 10 y - x 2 + 2 xy - 3 y 2 . 

The “graph” of F is obtained by plotting the values 
F(x,y) of F as heights above the corresponding points 
(x,y) of the plane, so now it is a surface instead of a 
curve. A smooth surface possesses not a tangent line 
at each point, but a tangent plane. If F has a maximum 
value, it will occur at a point where the tangent plane 
is horizontal. 

The tangent plane at each point (x,y) is the graph 
of the linear function that best approximates F near 
( x,y ). For small values of h and k, F(x + h,y + k) will 
be approximately equal to F(x,y) plus a function of 
the form 

(h,k) - ah + bk, 

that is, F(x,y) plus a linear function of h and k. As 
explained in some fundamental mathematical def- 
initions [1.3 §5.3], the derivative of F at (x,y) is this 
linear map. The map can be represented by the pair 
of numbers (a, b), which can in turn be thought of as 
a vector in R 2 . This derivative vector is usually called 
the gradient of the function F at the point (x,y) and 
is written VF(x,y). In vector notation (writing x for 
(x, y) and h for ( h,k )), the approximation to F near 
(x,y) is 

(1) 


Thus, VF points in the direction in which F increases 
most rapidly if you start at x, and the magnitude of VF 
is the slope of the “graph” of F in this direction. 

The components a and b of the gradient can be calcu- 
lated using partial differentiation. The number a tells 
us how quickly F(x,y) changes as we vary x while 
keeping y fixed: so to find a, we differentiate F(x,y ) = 
2x+ 10 y-x 2 + 2 xy - 3 y 2 with respect to x, treating y 
as a constant. In this case we get the partial derivative 


dF(x,y) 

dx 


= 2 - 2x + 2 y. 


Similarly, 


b = 


dF(x,y) 

dy 


10 + 2x - 6 y. 


Now, if we want to locate points where the tangent 
plane is horizontal, then we want to find the points at 
which the gradient is zero: that is, the points at which 
the vector (a, b) is the zero vector. So we solve the pair 
of simultaneous equations 


2 — 2x + 2 y = 0, 

10 + 2x - 6y = 0 

to get x = 4, y = 3. Thus the only candidate for the 
maximum is the point (4, 3), where F takes the value 19. 
It can be checked that 19 is indeed the maximum value 
ofF. 


2 The Gradient and Contours 

One of the most common ways of representing surfaces 
(landscapes on maps, for example) is by means of con- 
tour lines, or curves of constant height. In the xy-plane, 
we plot several curves of the form F(x,y) = V, for var- 
ious “representative” values of V. For the function we 
considered earlier, 

F(x,y) = 2x + lOy - x 2 + 2xy - 3 y 2 , 
the values 0, 8, 14, 18, 19 yield the contour plot shown 
in figure 1. The 14 contour, for example, contains all 
the points at which the surface has height 14. The fig- 
ure indicates that this particular surface is an elliptical 
hump whose peak occurs at (4, 3) and has height 19. 

There is a simple geometrical relationship between 
the contour lines and the gradient vector. The vector 
equation (1) shows that the direction h in which F is 
instantaneously constant is the direction which makes 
the scalar product h ■ VF equal to 0: the direction per- 
pendicular to VF. At each point, the gradient vector is 
perpendicular to the contour through that point. This 
faet underlies the method of Lagrange multipliers that 
we shall discuss in the next section. 


F(x + h) ~ F(x) + h ■ VF. 
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Figure 2 Constrained optimization. 

3 Constrained Optimization 
and Lagrange Multipliers 

It often happens that we are interested in the maxi- 
mum or minimum value of an objective function that 
depends upon several variables whose values are con- 
strained to satisfy certain equations or inequahties. 
Consider, for example, the following problem. 

Find the maximum value of 

F(x,y) =4 y-x 

over all pairs (x,y ) satis fying the constraint 

G(x,y) = x 2 -xy +y 2 -x + y -4 = 0. (2) 

Figure 2 shows the curve in the xy-plane defined by 
G(x,y ) = 0 (an ellipse), and also a number of contour 
lines of the function 4 y - x. Our aim is to find the 


largest that 4 y-x can be if (x,y) is a point on the 
curve. So we want to find the largest value of V for 
which the corresponding contour 4 y - x = V contains 
a point on the curve. The value of V increases as the 
lines move up the diagram, and the uppermost line that 
touches the curve is the one labeled 4y - x = 7. So the 
maximum value we are looking for is 7, and it occurs at 
the point where the line 4 y -x = 7 touches the curve. 
It is easy to check that this point is (1,2). 

How could we locate this point algebraically, rather 
than by drawing? The important thing to notice is that 
the optimizing line is tangent to the curve: the line and 
the curve are parallel at their common point. The line 
was chosen to be a contour of the function F. The curve 
is also a contour: the 0 contour of G. From the discus- 
sion in the previous section we know that these con- 
tours are perpendicular to the gradients of F and G, 
respectively (at the point in question). So the two gradi- 
ent vectors are parallel to one another and are therefore 
multiples of one another: VF = AVG, say. 

We thus have a way to hunt for solutions to the 
constrained optimization problem 

maximize F(x,y ) suhject to G(x,y) = 0. 

We look for a point (x, y) and a number A such that 
VF(x,y) = \VG(x,y) and G(x,y) = 0. (3) 

For our example (2), the gradient equation gives two 
partial derivative equations, 

-1 = A(2x-y- 1), 4 = A(— x + ty + 1), 


from which we conclude that 


If we substitute these values into the equation 
G(x,y) = 0, then we obtain 

13(1 -A 2 ) A 


which has two solutions: A = 1 and A = -1. If we sub- 
stitute A = 1 into (4), we get the point (1, 2) where F is 
at its maximum. (A = - 1 yields the minimum.) 

The number A that we introduced to solve the prob- 
lem is called a Lagrange multiplier. It is possible to 
reformulate the problem by defining the Lagrangian 


£(x,y,\)=F(x,y)-\G(x,y) 


and then condensing the equations (3) into 
equation 

VX = 0. 


single 


The reason this works is that if we differentiate £ with 
respect to A, then we obtain G(x, y), so requiring this 
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partial derivative to be zero is equivalent to requiring 
G(x,y ) to be zero. And asking for the other two par- 
tial derivatives to be zero is equivalent to requiring 
that VF = AVG. The remarkable faet about this refor- 
mulation is that it has turned a constmined optimiza- 
tion problem involving x and y into an unconstmined 
problem involving x, y, and A. 

4 The General Method of Lagrange Multipliers 

In real problems we may wish to optimize a func- 
tion F of many variables x \ , . . . , x n under many con- 
straints G i(xi,...,x n ) = 0, G2(xi,...,x n ) = 0, 

G m (x i x n ) = 0. In this case we introduce a 

Lagrange multiplier for each constraint and define the 
Lagrangian L by the formula 

X(xi,...,x n ,Ai,...,A m ) 

= F(X1, ...,Xn) - X AiGKxi, . . 

The partial derivative of X with respect to A* is zero if 
and only if G; (xi , . . . , x n ) = 0. And the partial deriva- 
tives with respect to the x* will all be zero if and only if 
VF = XT^VG,. This tells us that any direction that is 
perpendicular to all the gradients VG; (and therefore 
lies in all their “contour hypersurfaces”) will be perpen- 
dicular to the gradient VF as well, so we cannot find a 
direction in which F inereases while all the constraints 
are satisfied. 

Problems of this kind occur frequently in economics, 
where the objective funetion F is a cost (which we are 
probably trying to minimize), and the constraints force 
us to allocate spending among different items so as to 
satisfy certain overall demands. For instance, we might 
want to minimize the total cost of supplies of various 
different foodstuffs that between them had to satisfy 
various nutritional demands. In this case, the Lagrange 
multipliers have an interpretation as “notional prices.” 
As we have just seen, at the optimum point we have an 
equationof theform VF = This tells ushow 

mueh F will vary as we vary the Gi by small amounts: 
that is, it tells us the costs associated with inereasing 
the various demands. 

For a further use of Lagrange multipliers, see the 
MATHEMATICS OF TRAFFIC IN NETWORKS [VII.4]. 


III.67 Orbifolds 


If you take a quotient [1.3 §3.3] of the plane IR 2 by a 
group of symmetries, then you may obtain a manifold 


[1.3 §6.9]. For instance, if the group consists of all trans- 
lations by an integer vector, then two points (x, y) and 
(z,w) are equivalent if and only if z - x and w - y are 
both integers, and the quotient space is a torus. How- 
ever, if you take instead the group of all rotations about 
the origin through a multiple of tt/ 3, then every point 
apart from the origin is equivalent to exaetly five others, 
while the origin is equivalent only to itself. The result 
in this case is not a manifold, because the exceptional 
behavior at the origin results in a singularity. However, 
it is a well-understood kind of singularity. An orbifold 
is, roughly speaking, just like a manifold, except that 
whereas manifolds are locally like R n , orbifolds are 
locally like quotients of B” by groups of symmetries, 
and can therefore have a few singularities. See alge- 
BRAIC GEOMETRY [IV. 7 §7] and alSO MIRROR SYMMETRY 
[IV.14 §7]. 


III.68 Ordinats 


Loosely speaking, the ordinals are what we get if, start- 
ing with 0, we use the following two procedures. We 
can add 1 to whatever we have, and we can “collect 
together” (or “take the limit of”) whatever we have so 
far. So from 0 we would get 1, then 2, then 3, and so 
on. After all of those, we could take their “limit” (i.e., 
the limit of 0, 1, 2, 3, ... ), which is called co. Then we 
can add 1, obtaining co + 1, then co + 2, and so on. And 
then we can take the limit of all of those, to obtain an 
ordinal we could write as co + co. And so on. Note that 
this final “and so on” carries quite a lot inside it. For 
example, the ordinals do not just consist of finite sums 
of cos and natural numbers, since we can take the limit 
of co, co + co, co + co + co, ... , which we might call co 2 . 

Ordinals arise in two ways (which turn out to be 
closely related). First, they give a measure of the “size” 
of a well-ordering. A well-ordering on a set is an order- 
ing in which every (nonempty) subset has a least ele- 
ment. For example, the set §, f , .. .} u { § , |, |, . . . } 
in the reals is well-ordered, while the set { ■ ■ ■ , \ , | r § 1 
is not. The flrst set is order isomorphic to the ordinals 
less than co + co, meaning that there is a bijection that 
preserves the order. So one says that that set has order 
type co + co. 

Ordinals also commonly arise when one wishes to 
index transfimte processes. Here “transfinite” means 
“going beyond fimte.” As an example, suppose that we 
wish to “count, in inereasing order” the elements of the 
well-ordered set above. How would we do it? We would 
start with | , then § , then | , and so on. But, at the end of 
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all time, we would still not have reached elements like 
| or |. So we would start again: “at time co” we would 
count | , then at time co + 1 we would count | , and so 
on. Thus our counting is complete by time co + co. 

For a more detailed explanation of ordinals, includ- 
ing more examples and more on how they arise in 
mathematics, see set theory [IV.l §2]. 


III.69 The Peano Axioms 


Everyone knows what the natural numbers are: 0, 1, 2, 
3, and so on. But how would we make that “and so on” 
precise? Can we look at the way that we reason about 
natural numbers and isolate a few basic principles, or 
axioms, whose consequences do complete justice to our 
intuitive picture of what the natural numbers should 
be? To put it another way, when we are proving some- 
thing about the natural numbers, what assumptions do 
we need in order to get started? 

To answer this question, let us strip things down to 
the bare minimum: we have an object called 0, and 
an operation 5, called the successor function, which 
we think of intuitively as “adding 1.” In this pared- 
down language, we would like to say two things: that 
all the numbers 0, 5 (0) , 5 (5 (0) ) , . . . are distinet natural 
numbers, and that there are no others. 

One simple way is to use the following two axioms. 
The first says that 0 is not a successor: 

(i) For all x, s(x) * 0. 

The second says that distinet elements stay distinet 
when you take their successors: 

(ii) For all x and y, if x =t= y, then s(x ) * s(y). 

Note that this implies, for example, that 5 (5 (5(0))) * 
5(0), for if they were equal, then, from rule (ii), we could 
deduce that 5(5(0)) = 0, contradicting rule (i). 

Now, how can we say that there are no other natural 
numbers? One would like to say that, for every x, either 
x = 0 or x = 5(0) or x = 5(5(0)) or ■ ■ ■ , but that is an 
infinitely long statement, and those are definitely not 
allowed. After the failure of that very natural attempt, 
one might gues s that there is no way to achieve the goal, 
but in faet there is a brilliant solution: induction. Here 
is an axiom that expresses the principle of induction. 

(iii) Fet A be any subset of the natural numbers with 
the following properties: 0 e A, and 5(x) e A 
whenever x e A. Then A must be the set of all 
natural numbers. 


Note that this does express our intuitive idea that there 
are no “extra” natural numbers, since we can take A to 
be the set of all the numbers 0,5(0), 5(5(0)),... that 
were on our list. 

Rules (i), (ii), and (iii) are called the Peano axioms for 
the natural numbers. As explained above, they “charac- 
terize” the natural numbers, in the sense that all rea- 
soning about the natural numbers may be reduced or 
rewritten in such a way that the only assumptions one 
needs are the Peano axioms. 

There is a related system used in logic, called the first- 
order Peano axioms. The idea here is that we want to 
express the Peano axioms in the language of first-order 
logic. This means that we are allowed variables (that 
are interpreted as ranging over the natural numbers), 
as well as the symbols 0 and s, logical connectives, and 
the like, but nothing more: so there is no “member of” 
symbol, and no sets are allowed. (However, for technical 
reasons one does allow symbols for “plus” and “times.”) 

To give an idea of what is allowed and what is not, 
consider the statements “there are infmitely many per- 
feet squares” and “every infinite set of positive inte- 
gers contains either infinitely many odd numbers or 
infinitely many even numbers.” With a little effort, we 
can express the first of these statements in first-order 
logic, as follows: 

(Vm)(3n)(3x) xx = m + n. 

In words, this says that for every m you can find a per- 
feet square of the form m + n (which is how we express 
the faet that it is larger than m). However, in order to 
express the second statement, we find ourselves want- 
ing to write ( V A ) , where A ranges over all possible sub- 
set s of the natural numbers, rather than all possible 
elements: this is the main thing that is not allowed in 
first-order logic. 

By this criterion, rules (i) and (ii) are fine, but rule 
(iii) is not. Instead, we have to use an “axiom scheme,” 
which is an infmite set of axioms, one for each first- 
order statement p(x). So our version of rule (iii) is this: 
for each statement p(x), we have an axiom saying that 
if p(0) is true, and p(x) implies p(s(x)), then p(x) is 
true for all x. 

Note that these axioms do not have the full strength 
of the usual Peano axioms. For instance, there are only 
countably many possible formulas p(x), whereas there 
are uncountably many sets A. It turns out that in faet 
there are “nonstandard” models of these axioms, mean- 
ing structures other than the natural numbers that 
satisfy the axioms of first-order Peano arithmetic. 
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Actually, one also allows parameters in the state- 
ments p(x); for example, p(x) could be the statement 
“there exists z withx = y+z" whichwould correspond 
to the set of all natural numbers greater than or equal 
to y, and would therefore depend on y. And one also 
adds some axioms saying how plus and times behave 
(for example, commutativity of addition). This whole 
collection of axioms is known as Peano arithmetic, or 
PA for short. 

See model theory [IV.2] for more on some of the 
topics discussed in this article. 


ID. 70 Permutation Groups 

Martin W. Liebeck 


Let S be a set. A permutation of S is a function from 
S to 5 that is both injective and surjective— in other 
words, a function that “rearranges” the elements of S. 
For example, if S = { 1,2, 3}, then the function a :S — S 
that sends 1 to 3, 2 to 1, and 3 to 2 is a permutation of 
S; so is the function b that sends 1 to 3, 2 to 2, and 3 to 
1; whereas the function c that sends 1 to 3, 2 to 1, and 3 
to 1 is not a permutation. An example of a permutation 
of the set of real numbers R is the function % — 8 - 2x. 

From the point of view of finite group theory, the 
most important permutations to study are those of the 
set I n = {1,2, ,..,n}, where n is a positive integer. 
Let S n denote the set of all permutations of I n . So, for 
example, the permutations a and b defined in the pre- 
vious paragraph lie in S 3 . To count how many permuta- 
tions there are altogether in S n , observe that, for a per- 
mutation/ : I n — ■ I n , there are n choices for /( 1), then 
n — 1 choices for /( 2) (we can choose anything differ- 
ent from /( 1)), then n - 2 for /( 3), and so on, until we 
have just 1 choice for /(n). Therefore the total number 
of permutations in S n is n(n - 1) (n - 2) ■ ■ ■ 1 = ni. 

If / and g are permutations of a set S, their composi- 
tion f o g is defined by f° g (s) = f(g(s)) for alls é S, 
and it is quite easy to see that f ° g is also a permuta- 
tion of S. It is usual to drop the “°” symbol and write 
just fg instead of f ° g. For example, if a,b e S 3 are 
as in the first paragraph, then ab e S 3 sends 1 to 2, 2 
to 1, and 3 to 3, while ba sends 1 to 1, 2 to 3, and 3 to 
2. Notice that ab f ba. 

For any set S, the identity function 1 : S — ■ S, defined 
by 1 ( 5 ) = 5 for all s e S, is a permutation of S; and if / 
is a permutation of S, then there is an inverse permu- 
tation / -1 that sends everything back to where it came 
from and therefore satisfies // _1 = f~ l f = 1 . For 
example, the inverse of the above permutation a e S 3 


is the permutation that sends 1 to 2, 2 to 3, and 3 to 
1. Also, for any permutations /, g, h of 5, we have 
f(gh) = ( fg)h , since both sides send any s e 5 to 
f(g(h(s))). 

Thus, the set of all permutations of S, together with 
the binary operation [1.2 §2.4] of composition, satis- 
fies the axioms for a group [1.3 §2.1]. In particular, S n is 
a finite group of size n!, known as the symmetric group 
ofdegree n. 

There is a neat way of representing permutations suc- 
cinctly, known as the cycle notation. It is hest explained 
with an example. Let d g Se be the permutation 1 — 3, 
2 — 5, 3 — 6 , 4 -> 4, 5 — 2, 6 — 1. We can represent 
this more economically by writing 1 — 3 — 6 — 1 , and 
4 — ■ 4. We say the symbols 1, 3, 6 form a cycle of d (of 
length 3); similarly, 2, 5 form a cycle of length 2, and 
4 a cycle of length 1. We then compress our notation 
even fur ther and write d = (13 6 ) (2 5) (4), indicating 
that each number 1, 3, 6 in the first cycle is sent to the 
next one, except for the last which is sent back to the 
first, and likewise for the second and third cycles. This 
is the cycle notation for d; notice that the cycles have 
no symbols in common — they are called disjoint cycles. 
It is not too hard to see that every permutation in S n 
can be expressed as a product of disjoint cycles; this 
is what we mean by the cycle notation for a permu- 
tation. For example, in cycle notation, the six permu- 
tations of S 3 are t, (1 2 ) (3), (1 3) ( 2 ), (2 3) (1), (1 2 3), 
and (13 2). (The permutations a and b in the first para- 
graph are (1 3 2) and (1 3) (2), respectively.) You might 
like to while away a few minutes by working out the 
multiplication table of S 3 . 

The cycle-shape of a permutation g is the sequence 
of numbers we get by writing down the lengths of the 
disjoint cycles in the cycle notation for g, in decreasing 
order. For example, the cycle-shape of the permutation 
(163) (24) (58) (7) (9) in S 9 is (3, 2, 2, 1,1), or more 
succinctly (3, 2 2 , l 2 ). 

One can deftne the powers of a permutation f e S n 
in a natural way— namely, /' = /, f 2 = ff, f 3 = f 2 f, 
and so on. For example, if e = (12 3 4) e S 4 , then 
e 2 = (13) (2 4), e 3 = (14 3 2), and e 4 = 1 . The order 
of a permutation / e S n is defined to be the smallest 
positive integer r such that f r = t : that is, the smallest 
number of times we have to do / to send everything 
back to where it came from. So the order of the 4-cycle 
e above is 4. In general, the order of an r-cycle (i.e., a 
cycle of length r) is equal to r, and the order of a per- 
mutation in cycle notation is equal to the least common 
multiple of the lengths of the (disjoint) cycles. 
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It is often useful to be able to work out the order of 
a permutation. Here is one such instance. Suppose we 
shuffle a pack of eight cards in the following way: the 
pack is divided into two equal parts and then “inter- 
laced,” so that if the original order was 1,2,3, 4, 

the new order is 1, 5, 2, 6, How many times must 

this shuffle be repeated before the cards are again in 
the original order? Well, the shuffle gives the permu- 
tation of the eight card positions sending 1 to 1, 2 
to 5, 3 to 2, 4 to 6, and so on, which in cycle nota- 
tion is (1) (2 5 3) (4 6 7) (8). This has order 3, so the 
cards return to their original order after three shuffles. 
Things get quite interesting if we consider the same 
problem for different numbers of cards — you might like 
to try it yourself with fifty-two cards, for instance. 

There is one slightly more subtle aspect of permu- 
tations which is important for group theory: namely, 
the theory of even and odd permutations. Again, this 
is hest illustrated by example. Take n = 3, and let x\, 
X 2 , X'i be three variables. Let us think of the permu- 
tations in S 3 as moving these variables around rather 
than the numbers 1, 2, and 3. So, for instance, we 
shall take the permutation (13 2) to send x\ to X 3 , 
%2 to X ] , and *3 to %2- Now let A be the expression 
A = (xi - %2)(xi - xs)(x 2 - X 3 ). We can apply per- 
mutations in 53 to A in an obvious way: for example, 
(1 2 3) sends A to (%2 - X3KX2 - X1MX3 - Xi). Notice 
that this is just the expression for A with two of the 
brackets, (xi - X2) and (xi - X3), reversed. So (1 2 3) 
sends A to A. However, if we apply (1 2) (3) to A, we 
get (X2 - X1MX2 — X3KX1 — X3) = -A. You can see 
that each permutation in 53 sends A to either +A or 
-A. Call those permutations that send A to +A even 
permutations and those that send A to -A odd permu- 
tations. Check that t, (12 3), and (13 2) are even, while 
(1 2) (3), (1 3) (2), and (2 3) (1) are odd. 

The definition of even and odd permutations for gen- 
eral n is very similar to this example. Let Xi , . . . , x n 
be variables, and take the permutations in S n to 
move these variables around rather than the symbols 
1, 2, . . . , n. Deflne A to be the product of all x, - xj for 
i < j. Just as in the example, we can apply any per- 
mutation g e S n to A, and the result will be either 
+A or -A. Deflne the signature of g to be the num- 
ber sgn(g) e { + 1, -1} such that g( A) = sgn(g)A. This 
defines the signature function sgn : S„ — { + 1,-1}. 
Then a permutation g e S n is even if sgn (g) = +1, and 
is odd if sgn(g) = -1. 

It follows easily from the definition that 
sgn (gh) = sgn(g) sgn(h) 


for any g, h e S n , and also that the signature of any 
2-cycle is -1. Since an r-cycle (a\ a2 ■ ■ ■ a r ) can be 
expressed as a product (ai a r )(a\ a r - 1) ■ ■ ■ (ai CI2) of 
2-cycles, the signature of the r-cycle is (-l) r_1 . Hence, 
if g e S„ has cycle-shape (ri,r2,...,n-), then 
sgn(g) = (-l) ri_1 (-l) r2_1 ■ ■ ■ (-l) n_1 . 

This makes it easy to work out the signature of any per- 
mutation. For example, the even permutations in S5 are 
those that have cycle-shape (l 5 ), (2 2 , 1), (3, l 2 ), or (5). 
If you count these, you will find that there are sixty even 
permutations in S5 altogether, which is exactly half of 
the total of 5! = 120 permutations in S 5 . In general, the 
number of even permutations in S n is |n!. 

So what is the point of this complicated definition? 
The answer is that the set of all even permutations in S n 
forms a subgroup of size \n\, known as the alternating 
group of degree n, and written as A n . The alternating 
groups are very important examples of finite groups, 
because of the faet that, for n > 5, A n is a simple 
group— that is, its only normal subgroups [1.3 §3.3] 
are the identity subgroup and A n itself (see the clas- 
sification of finite simple groups [V.8]). For exam- 
ple, A5 is a simple group of size 60, and in faet is the 
smallest non-Abelian finite simple group. 


in. 71 Phase Transitions 


If you heat up a block of ice, then it lurns into water. 
This very familiar phenomenon is actually rather mys- 
terious, because it shows that the properties of the 
Chemical H 2 0 do not depend continuously on temper- 
ature: the block of ice goes straight from a solid to a 
liquid, rather than doing so by a process of gradual 
softening. 

This is an example of a phase transition. Phase transi- 
tions tend to occur in systems that involve a large num- 
ber of particles with “local” interactions — that is, where 
the behavior of one particle is direetly influenced only 
by the particles in its immediate vicinity. 

Such systems can be modeled mathematically, and 
the study of these models belongs to the area known as 
statistical physics. For further discussion of such mod- 
els, see PROBABILISTIC MODELS OF CRITICAL PHENOM- 
ENA [IV.26]. 


III. 72 TT 


What makes one number more fundamental and impor- 
tant, mathematically speaking, than another? Why, for 
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instance, would almost everybody agree that 2 is more 
important than |§? One possible answer is that what 
really matters about a number is its properties, and in 
particular any interesting properties it might have that 
distinguish it from all other numbers. Of course, we 
now have to decide what counts as an interesting prop- 
erty: for example, why do we not regard it as interesting 
that || is the only number that gives you |§ when you 
double it? An obvious reason is that there is an analo- 
gous property for every number x you might care to 
choose: x is the only number that gives you 2x when 
you double it. By contrast, the property “is the smallest 
prime number” does not mention any specific number 
and is easily stated in terms of a concept, that of “prime 
number,” whose importance is itself easy to explain. 
This property must apply to exactly one number, so it 
is likely that that number will have an important part to 
play in mathematics, and indeed it does. (As it happens, 
|| is conjectured to be an important critical exponent 
in statistical physics, which means that it can be sin- 
gled out as an interesting number, though still nothing 
like as fundamental as 2.) 

Everybody agrees that tt is one of the most important 
numbers in mathematics, and it is easy to justify this 
assessment by the criterion of the previous paragraph, 
because tt has an abundance of properties — so many 
that when tt appears unexpectedly in a calculation, one 
is not unduly surprised. For example, the following is 
a famous theorem of Euler: 



What on earth, one might wonder, has tt to do with 
adding up reciprocals of squares? This is a perfectly 
legitimate question, but the idea that there could in 
principle be a connection is not, to an experienced 
mathematician, a surprise. A very common way to 
prove mathematical identities is to show that the two 
sides of the identity are different ways of evaluating 
the same quantity. In this case, one can use a basic 
faet from fourier analysis [III. 2 7], known as Parse- 
val’s identity, which States the following. If / : R — C is 
a periodic funetion with period 2tt, and for every inte- 
ger n (positive or negative) we define its nth Fourier 
coefficient a n by the formula 

a " = 2 tt 1 n f (x ^ inX dx ’ 

then 

I \a n \ 2 . 


If you now take as / the funetion that is 1 whenever x 
is between (2 n - \ )tt and (2n + |)tt for some integer 
n, and 0 otherwise, then you find that the left-hand side 
works out as \. You also find, after a small calculation, 
that \a n \ 2 = l/nn 2 when n is odd, that |aol 2 = §| 
and that \a n \ 2 = 0 whenever n is even and nonzero. 
Therefore, 


11 ^ 

2 “ 4 + TT 2 ^n 2 ' 

Bearing in mind that n 2 = (-n) 2 , we can deduce easily 


This closely resembles the identity we were trying to 
prove, which we can get by noticing that the right-hand 
side is equal to Xn 1 /n 2 - Xn l/(2n) 2 , which is three 
quarters of Xn 1/tt 2 - Therefore, X« 1/tt 2 = tt 2 / 6. 

Now we have a reason for the appearance of tt: it 
comes up in the formula for the Fourier coefficients. 
What is more, its appearance there can be explained as 
well. A periodic funetion on R is more naturally thought 
of as a funetion defined on the unit circle. The Fourier 
coefficient a n is a certain average defined on the unit 
circle, so we have to divide by the length of the circle, 
which is 2 tt. 

What, then, is tt? Well, we have just seen what is per- 
haps the most elementary definition: it is the ratio of 
the circumference of a circle to its diameter. But what 
makes tt so interesting is that it has many different 
deftning properties. Here are a few more of them. 


(i) Define a funetion sinx to be equal to the sum of 
the power series 


Then tt is the smallest positive number x such that 
sinx = 0. (For more on sinx, see trigonometric 
FUNCTIONS [III.94].) 


) TT = ^ • 

) j = J^vr^tdx. 


(v) V2n = J e * 2 dx. 

(vt) tt = É TgF ( 8fc+ i ~ sfc + 4 


8fc + 5 8k + 6, 
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The integrals on the right-hand sides of the second 
and third properties above are expressions for half 
the circumference of the unit circle and half its area, 
respectively. So those definitions are analytical expres- 
sions of the geometrical facts that a unit circle has 
circumference 2tt and area tt, respectively. 

The fifth property tells us what constant to put in 
front of e - * 2 to make it into the famous normal dis- 
tribution [III.73 §5]. (Why should tt come into it? One 
can give several reasons. One is that the function e~ x 
has a special role in Fourier analysis, and so does tt. 
Another fundamental property of e - * 2 is that the func- 
tion f(x,y) = e~ (x2+ y 2) is rotationally invariant, and 
rotations involve circles, which involve tt.) 

The last formula above is a remarkable recent discov- 
ery of David Bailey, Peter Borwein, and Simon Plouffe. 
The presence of the factor 1 / 1 6 k leads to a way of calcu- 
lating hexadecimal digits of tt (that is, digits to base 16), 
without needing to work out all the earlier digits first. It 
has been used to work out digits that are astonishingly 
far along the hexadecimal expansion: for example, it is 
known that the trillionth hexadecimal digit is 8. 

A faet that seems paradoxical to many nonmathe- 
maticians is that a number as natural as tt turns out 
to be irrational, and als o TB ANSCÉNDkNTA i . [111.43] . 
However, this is not surprising at all: the defining prop- 
erties of tt are simple, but they do not lead to solutions 
of polynomial equations, so it would be extraordinary 
if tt were not transcendental. Similarly, it would be a 
major surprise if one could find any pattern in the dec- 
imal digits of tt. Indeed, tt is conjectured to be normal 
to base 10, meaning that every sequence of digits occurs 
with about the frequency you would expect: for exam- 
ple, if you look at pairs of consecutive digits, then you 
expect 35 to occur about a hundredth of the time. How- 
ever, this conjecture seems to be very hard, and it has 
not even been proved that the decimal expansion of tt 
contains all the digits from 0 to 9 infinitely often. 


III. 7 3 Probability Distributions 

James Norris 


1 Discrete Distributions 

When we toss a coin, we have no idea whether it will 
land heads or tails. However, there is a different sense 
in which the behavior of the coin is highly predictable: 
if it is tossed many times, then the proportion of heads 
is very likely to be close to 


In order to study this phenomenon mathematically, 
we need to model it, and this is done by defining a 
sample space, which represents the set of possible out- 
comes, and a probability distribution on that space, 
which tells you their probabilities. In the case of a coin, 
the natural sample space is the set {H, T}, and the obvi- 
ous distribution assigns the number | to each element. 
Alternatively, since we are interested in the number of 
heads, we could use the set {0,1} instead: after one 
toss, there is a probability of \ that the number of 
heads is 0 and a probability of \ that it is 1. More 
generally, a (discrete) sample space is simply a set O, 
and a probability distribution on Q is a way of assign- 
ing a nonnegative real number to each element of Q in 
such a way that the sum of all these numbers is 1. The 
number assigned to a particular element of Q is then 
interpreted as the probability that some corresponding 
outeome will occur, the total probability being 1. 

If O is a set of size n, then the uniform distribution 
on Q is the probability distribution that assigns a prob- 
ability of l/n to each element of Q. However, it is often 
more appropriate to assign different probabilities to 
different outeomes. For example, given any real num- 
ber p between 0 and 1, the Bernoulli distribution with 
parameter p on the set {0, 1} is the distribution that 
assigns the number p to 1 and 1 - p to 0. This can be 
used to model the toss of a biased coin. 

Suppose now that we toss an unbiased coin n times. 
If we are interested in the outeome of every toss, then 
we would choose the sample space consisting of all pos- 
sible sequences of Os and ls of length n. For instance, if 
n = 5, a typical element of the sample space is 01101. 
(This particular element represents the outeome tails, 
heads, heads, tails, heads, in that order.) Since there are 
2” such sequences and they are all equally likely, the 
appropriate distribution on this space will be the uni- 
form one, which assigns a probability of 1/2” to each 
sequence. 

But what if we are interested not in the particular 
sequence of heads and tails but just in the total number 
of heads ? In that case, we could take as our sample 
space the set {0, 1, 2, . . . , n}. The probability that the 
total number of heads is k is 2~ n times the number of 
sequences of Os and ls that contain exaetly k ls. This 
number is 

/n\ = ni 

\k) k\(n- fe)! 

so the probability we assign to k is pk = {jj2~ n . 

More generally, for a sequence of n independent 
experiments, each with the same probability p of suc- 
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cess, the probability of a given sequence of fe successes 
and n - fe failures is p k ( 1 - p) n ~ k . So, the probability 
of having exactly fe successes is pk = (™)p k ( 1 - p) n ~ k . 
This is called the binomial distribution with parameters 
n and p. It models the number of heads if you toss a 
biased coin n times, for example. 

Suppose we perform such experiments for as long 
as we need to in order to obtain one success. When fe 
experiments are performed, the probability of getting 
fe- 1 failures followedbya success is pk = (1 -p) k_1 p- 
Therefore, this formula gives us the distribution of the 
number of experiments up to the first success. It is 
called the geometric distribution of parameter p. In par- 
ticular, the number of tosses of a fair coin needed to get 
the first head has a geometric distribution of parame- 
ter 2 - Notice that our sample space is now the set of 
all nonnegative integers — in particular, it is infinite. So 
in this case the condition that the probabilities add up 
to 1 is requiring that a certain infinite series (the series 
Zfc= i Pk) converges to 1. 

Now let us imagine a somewhat more complicated 
experiment. Suppose we have a radioactive source that 
occasionally emits an alpha particle. It is often reason- 
able to suppose that these emissions are independent 
and equally likely to occur at any time. If the average 
number of emissions per minute is A, say, then what is 
the probability that during any given minute there will 
be fe particles emitted? 

One way to think about this question is to divide up 
the minute into n equal intervals, for some large n. If n 
is large enough, then the probability of two emissions 
occurring in the same interval is so small that it can 
be ignored, and therefore, since the average number of 
emissions per minute is A, the probability of an emis- 
sion during any given interval must be approximately 
A/n. Let us call this number p. Since the emissions are 
independent, we can now regard the number of emis- 
sions as the number of successes when we do n trials, 
each with probability p of success. That is, we have the 
binomial distribution with parameters n and p, where 
p = A/n. 

Notice that as n gets larger, p gets smaller. Also, the 
approximations just made become better and better. It 
is therefore natural to let n tend to infinity and study 
the resulting “limiting distribution.” It can be checked 
that, in the limit as n -> »o, the binomial probabilities 
converge to Pk = e _A A k /fc!. These numbers define a dis- 
tribution on the set of all nonnegative integers, known 
as the Poisson distribution of parameter A. 


2 Probability Spaces 

Suppose that I throw a dart at a dartboard. Not being 
very good at darts, I am not able to say very much about 
where the dart will land, but I can at least try to model 
it probabilistically. The obvious sample space to take 
consists of a circular disk, the points of which represent 
where the dart lands. However, now there is a problem: 
if I look at any particular point in the disk, the prob- 
ability that the dart will land at precisely that point is 
zero. So how do I define a probability distribution? 

A clue to the answer Ues in the faet that it seems to 
be perfeetly easy to make sense of a question such as 
“What is the probability that I will hit the buU’s-eye?” 
In order to hit the bull’s-eye, the dart has to land in 
a certain region of the board, and the probabiUty of 
this happening does not have to be zero. It might, for 
distance, be equal to the area of the buU's-eye region 
divided by the total area of the board. 

What we have just observed is that even if we cannot 
assign probabilities to individual points in the sample 
space, we can stUl hope to give probabUities to subsets. 
That is, if Q is a sample space and A is a subset of O, we 
can try to assign a number P(A) between 0 and 1 to the 
set A. This represents the probabUity that the random 
outeome belongs to the set A, and can be thought of as 
something Uke a notion of “mass” for the set A. 

For this to work, we need P(fl) to be 1 (since the 
probabUity of getting something in the sample space 
must be 1). Also, if A and B are disjoint subsets of Q, 
then P(Au£) should be P( A) + P (5 ) . From this it foUows 
that if Ai , . . . , A n are all disjoint, then p(Ai u ■ ■ ■ u A n ) 
is equal to P(Ai) + ■ ■ ■ +P(A n ). Actually, it turns out to 
be important that this should be true not just for finite 
unions but even for countably infinite [III. 11] ones 
as weU. (Related to this point is the faet that one does 
not attempt to define P(A) for every subset A of O but 
just for measurable subsets [III.57]. For our purposes, 
it is sufficient to regard P(A) as given whenever A is a 
set we can actually define.) 

A probability space is a sample space O together with 
a funetion P, defined on aU “sensible” subsets A of Q, 
that satisfies the conditions mentioned in the previ- 
ous two paragraphs. The funetion P itself is known as 
a probability measure or probability distribution. The 
term probability distribution is often pref erred when we 
specify P concretely. 
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3 Continuous Probability Distributions 

There are three particularly important distributions 
defined on subsets of R, of which two will be discussed 
in this section. The flrst is the uniform distribution on 
the interval [0,1]. We would Uke to capture the idea 
that “all points in [0, 1] are equally likely.” In view of 
the problems mentioned above, how should we do this? 

A good way is to take seriously the “mass” metaphor. 
Although we cannot calculate the mass of an object by 
adding up the masses of all the infinitely small points 
that make up the object, we can assign to those points a 
density and integrate it. That is exactly what we shall do 
here. We assign a probability density of 1 to each point 
in the interval [0,1]. Then we determine the probability 
of a subinterval, [5, |] say, by calculating the integral 
P([ 3 , 2 J) = S1/3 1 dx = g. More generally, the probabil- 
ity associated with an interval [a,b\ will just be its 
length b - a. The probability of a union of intervals 
will then be the sum of the lengths of those intervals, 
and so on. 

This “continuous” uniform distribution sometimes 
arises naturally from requirements of symmetry, just 
like its discrete counterpart. It can also arise as a lim- 
iting distribution. For instance, suppose that a hermit 
lives deep in a cave, away from any clocks or sources 
of natural light, and that each “day” he spends lasts 
for a random length of time between twenty-three and 
twenty-five hours. To start with, he will have some idea 
of what the time is, and be able to make statements 
such as, “I'm having lunch now, so it’s probably light 
outside,” but after a few weeks of this regime, he will 
no longer have any idea: any outside time will be just 
as likely as any other. 

Now let us look at a rather more interesting density 
function, which depends on the choice of a positive con- 
stant A. Consider the density function fix) = Ae _Ax , 
defined on the set of all nonnegative real numbers. To 
work out the probability associated with an interval 
[a, b], we now calculate 

rb rb 

/(x)dx = Ae _Ax dx = e~ Aa - e~ Ab . 

The resul ting probability distribution is called the expo- 
nential distribution with parameter A. The exponen- 
tial distribution is appropriate if we are modeling the 
time T of a spontaneous event, such as the time it 
takes for a radioactive nucleus to decay, or for the next 
spam email to arrive. The reason for this is based on 
the assumption of memorylessness : for example, if we 
know that the nucleus remains intact at time s, the 


probability that it will remain intact until a later time 
5 + 1 is the same as the original probability that it would 
remain intact to time t. Let G(t) represent the prob- 
ability that the nucleus remains intact up to time t. 
Then the probability that it remains intact up to time 
5 + t given that it has remained intact up to time s is 
G(s + t)/G(s), so this has to equal G(t). Equivalently, 
Gis + t) = G(s)G(t). The only decreasing functions 
that have this property are exponential functions 
[BL 251, that is, functions of the form G(t) = e~ At for 
some positive A. Since 1 - G(t) represents the proba- 
bility that the nucleus decays before time t, this should 
equal J 0 fix) dx, from which it is easy to deduce that 
fix) = Ae -A *. 

We shall come to the third, and most important, 
distribution below. 

4 Random Variables, Mean, and Variance 

Given a probability space, an event is defined to be a 
(sufficiently nice) subset of that space. For example, if 
the probability space is the interval [0, 1] with the uni- 
form distribution, then the interval [ \ , 1 ] is an event: it 
represents a randomly chosen number between 0 and 
1 turning out to be at least 1. It is often useful to think 
not just about random events, but also about random 
numbers associated with a probability space. For exam- 
ple, let us look once again at a sequence of tosses of a 
biased coin that has probability p of coming up heads. 
The natural sample space associated with this exper- 
iment is the set Q of all sequences c o of Os and ls. 
Earlier, we showed that the probability of obtaining k 
heads is Pk = (jj) p k (\ - p) k , and we described that 
as a distribution on the sample space {0,1,2,..., Tt}, 
However, it is in many ways more natural, and often 
far more convenient, to regard the original set O as the 
sample space and to define a function X from O to R 
to represent the number of heads: that is, A(co) is the 
number of ls in the sequence to. We then write 

P {x = k) = p k = (fyp k a-p) k . 

A function Uke this is caUed a random variable. If X is a 
random variable and it takes values in a set Y, then the 
distribution of X is the function P defined on subsets 
of Y by the formula 

PIA) = P(A G A) = P({co e D : X(co) G A}). 

It is not hard to see that P is indeed a probabUity 
distribution on Y. 
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For many purposes, it is enough to know the distri- 
bution of a random variable. However, the notion of a 
random variable defined on a sample space captures 
our intuition of a random quantity, and it allows us to 
ask further questions. For example, if we were to ask 
for the probability that there were k heads given that 
the first and last tosses had the same outcome, then 
the distribution of X would not provide the answer, 
whereas our richer model of regarding X as a function 
defined on sequences would do so. Furthermore, we 
can talk of independent random variables, Xi, . . . ,X n 
say, meaning that the subset of O where Xi(uj) g Ai 
for all i has probability given by the product P(Xi g 
Ai) x ■ ■ ■ x P(A n g A n ) for all possible sets of values 

Ai. 

Associated with a random variable X are two impor- 
tant numbers that begin to characterize it, called the 
mean or expectation E(X) and the variance var (JO . 
Both these numbers are determined by the distribu- 
tion of X. If X takes integer values, with distribution 
P(Jf = k) = Pk, then 

E (X) = ^kpk, var (X) = ^(k- pfpk, 

k k 

where p = FJX). The mean tells us how big X is on 
average. The variance, or more precisely its square root, 
the standard deviation a = -Jvai(X), tells us how far 
away X lies, typically, from its mean. It is not hard to 
derive the following useful alternative formula for the 
variance: 

var(X) = E(A 2 )-E(X) 2 . 

To understand the meaning of the variance, consider 
the following situation. Suppose that one hundred peo- 
ple take an exam and you are told that their average 
mark is 75%. This gives you some useful information, 
but by no means a complete picture of how the marks 
are distributed. For example, perhaps the exam con- 
sisted of four questions of which three were very easy 
and one almost impossible, so that all the marks were 
clustered around 75%. Or perhaps about fifty people got 
full marks and fifty got around half marks. To model 
this situation let the sample space O consist of the 
hundred people and let the probability distribution be 
the uniform distribution. Given a random person c o, let 
X(co) be that person’s mark. Then in the first situation, 
the variance will be small, since almost everybody’s 
mark is close to the mean of 75%, whereas in the second 
it is close to 25 2 = 625, since almost everybody’s mark 
was about 25 away from the mean. Thus, the variance 


helps us to understand the difference between the two 
situations. 

As we discussed at the start of this article, it is known 
from experience that the “expected” number of heads 
in a sequence of n tosses of a fair coin is around \n, 
in the sense that the proportion is usually close to \ . 
It is not hard to work out that, if X models the num- 
ber of heads in n tosses, that is, if X is binomially dis- 
tributed with parameters n and then ¥,(X) = |n. 
The variance of X is \n, so the natural distance scale 
with which to measure the spread of the distribution 
is cr = j V' n - This allows us to see that X /n is close to 
\ with probability close to 1 for large n, in accordance 
with experience. 

More generally, if X \ , X2 , . . . , X n are independent ran- 
dom variables, then var(Ai+- ■ ■ +X n ) = var(Ai ) + ■ ■ ■ + 
var(X n ). It follows that if all the Xi have the same dis- 
tribution with mean p and variance a 2 , then the vari- 
ance of the sample average X = n -1 (Xi + ■ ■ ■ + X n ) is 
n _2 (na 2 ) = ] cr 2 , which tends to zero as n tends to 
infinity. This observation can be used to prove that, for 
any e > 0, the probability that \X - p \ is greater than e 
tends to zero as n tends to infinity. Thus, the sample 
average “converges in probability” to the mean p. 

This result is called the weak law of large num- 
bers. The argument sketched above implicitly assumes 
that the random variables have finite variance, but this 
assumption turns out not to be necessary. There is also 
a strong law of large numbers, which States that, with 
probability 1, the sample average of the first n vari- 
ables converges to p as n tends to infinity. As its name 
suggests, the strong law is stronger than the weak law, 
in the sense that the weak law can be deduced from 
the strong law. Notice that these laws make long-term 
predictions of a statistical kind about the real events 
that we have chosen to model using probability theory. 
Moreover, these predictions canbe checked experimen- 
tally, and the experimental evidence confirms them. 
This provides a convincing scientific justification for 
our models. 

5 The Normal Distribution and 
the Central Limit Theorem 

As we have seen, for the binomial distribution with 
parameters n and p, the probability pu is given by the 
formula fy p k ( 1 - p) n ~ k . If n is large and you plot the 
points (fe, pk) on a graph, then you will notice that they 
lie in a bell-shaped curve that has a sharp peak around 
the mean np. The width of the tall part of the curve has 
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order of magnitude -jnp ( 1 - p), the standard deviation 
of the distribution. Let us assume for simplicity that np 
is an integer, and define a new probability distribution 
<lk by qk = Pk+np ■ The points ( k,qk ) peak at fe = 0. If 
you now rescale the graph, compressing horizontally 
by a factor of Jnp(l - p) and expanding vertically by 
the same factor, then the points will all lie close to the 
graph of 



This is the density function of a famous distribution 
known as the standard normal distribution on R. It is 
also often called the Gaussian distribution. 

To put this differently, if you toss a biased coin a large 
number of times, then the number of heads, minus its 
mean and divided by its standard deviation, is close to 
a standard normal random variable. 

The function (l/^/2^T)e~ x /2 occurs in a huge variety 
of mathematical contexts, from probability theory to 
fourier analysis [III.27] to quantum mechanics. Why 
should this be? The answer, as it is for many such ques- 
tions, is that there are properties that this function has 
that are shared by no other function. 

One such property is rotational invariance. Suppose 
once again that we are throwing a dart at a dartboard 
and aiming for the bulTs-eye. We could model this as 
the result of adding two independent normal distri- 
butions at right angles to each other: one for the x- 
coordinate and one for the y-coordinate (each having 
mean 0 and variance 1, say). If we do this, then the two- 
dimensional “density function” is given by the formula 
(1 /2n)e~ x2 l2 £-y 2 12 , w hich can conveniently be written 
as (1/2 t r)e _r /2 , where r denotes the length of (x,y). 
In other words, the density function depends only on 
the distance from the origin. (This is why it is called 
“rotationally invariant.”) This very appealing property 
holds in more dimensions as well. And it turns out to 
be quite easy to check that (l/2Tr)e~ r 12 is the only 
such function: more precisely, it is the only rotation- 
invariant density function that makes the coordinates 
x and y into independent random variables of vari- 
ance 1. Thus, the normal distribution has a very special 
symmetry property. 

Properties like this go some way toward explaining 
the ubiquity of the normal distribution in mathemat- 
ics. However, the normal distribution has an even more 
remarkable property, which leads to its appearance 
wherever mathematics is used to model disorder in 
the real world. The central limit theorem States that, 


for any sequence of independent and identically dis- 
tributed random variables XuXi,... (with finite mean 
p and nonzero finite variance er 2 ), we have 

lim P(Ai + ■ ■ ■ + X n ^ np + sphax) 



for every real number x. The expected value of Xi + 

■ ■ ■ + X n is np and its standard deviation is *Jncr, so 
another way of thinking about this is to let Y n = (X\ + 

■ ■ ■+X n -np)/^/na. Thisrescales X\ + - ■ ■ +X n to have 
mean 0 and variance 1, and the probability becomes 
the probability that Y n < x. Thus, whatever distribu- 
tion we start with, the limiting distribution of the sum 
of many independent copies is normal (after appro- 
priate rescaling). Many natural processes can realisti- 
cally be modeled as accumulations of small indepen- 
dent random effeets, and this is why many distributions 
that one observes, such as the distribution of heights 
of adults in a given town, have a familiar bell-shaped 
curve. 

A useful application of the central limit theorem 
is to simplify what look like impossibly complicated 
calculations. For example, when the parameter n is 
large, the calculation of binomial probabilities becomes 
prohibitively complicated. But if X is a binomial ran- 
dom variable, with parameters n and \ , for instance, 
then we can write X as a sum Yi + ■ ■ ■ + Y n , where 

Ti Y n are independent Bernoulli random variables 

with parameter 5 , Then, by the central limit theorem, 

Um P(X < \n+ \jnx) = J 5/2 dy . 
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The real projective plane can be defined in various ways. 
One way is to use three homogeneous coordinates: a typ- 
ical point is represented as (x, y, z), where not all of x, 
y , and z are equal to 0, with the convention that if A is 
a nonzero constant, then (x, y, z) and (Ax, A y, Az) are 
regarded as equal. Notice that for each (x, y , z) the set 
of all points of the form ( Ax, Ay, Az) is the line through 
the origin and (x, y, z), and indeed a more geometrical 
definition of the real projective plane is that it is the set 
of all lines in R 3 that pass through the origin. Each such 
line meets the unit sphere in exaetly two points, which 
are opposite each other, and a third way of defining 
the real projective plane is to define opposite points in 
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the unit sphere to be equivalent and to take the quo- 
tient [1.3 §3.3] of the unit sphere by this equivalence 
relation [1.2 §2.3]. A fourth way to define the projec- 
tive plane is to start with the usual Euclidean plane and 
to add one “point at infinity” for each possible slope 
that a line can have. With an appropriate topology, this 
defines the projective plane as a compactification 
[III.9] of the Euclidean plane. 

Taking the third definition, a line in the projective 
plane is defined to be a great circle with its opposite 
points identified. It is then not hard to see that any 
two lines meet in exactly one point (since any two great 
circles meet in exactly two opposite points) and that 
any two points are contained in exactly one line. This 
property can be used to define much more abstract 
generalizations of the notion of a projective plane. 

Similar definitions hold for other helds besides 
R and in higher dimensions. For instance, complex 
projective n-space is the set of all points of the 
form (zi,Z2,...,z n +i), where not every z* is 0, with 
{zi,Z2, ■ ■ ■ ,z n +i) equivalent to (Azi,Az 2 ,--,Az„ + i) if 
A is a nonzero complex scalar. This is the set of all 
“complex lines” in C n+1 that pass through the origin. 
See SOME FUNDAMENTAL MATHEMATICAL DEFINITIONS 
[1.3 §6.7] for more details about projective geometry. 


III. 7 5 Quadratic Forms 

Ben Green 


A quadratic form is a homogeneous polynomial of 
degree 2 in some Hnite set of unknowns Xi, X 2 , . . . , x ra : 
an example is q(xi,X2,X3) = x\ - 3 xiX2 + 4x§. Flere, 
the coefficients 1, -3, and 4 are integers, but the idea 
generalizes straightforwardly from Z to any ring R. 
Since linear functions are undeniably important and 2 
is the next positive integer after 1, one might expect 
quadratic forms to be important as well, and indeed 
they are, in many different branches of mathematics, 
including linear algebra itself. 

Flere are two theorems about quadratic forms. 

Theorem 1. If x, y, and z are three points in Z d , 
then the distances between them satisfy the triangle 
inequahty 

\x-z\ 4 \x-y\ + \y - z\. 

Theorem 2. An odd prime p can be written as the sum 
of two squares if and only if it leaves remainder 1 on 
division by 4. 


It is not at hrst sight clear why theorem 1 has any- 
thing to do with quadratic forms. The reason is that the 
square of the Euclidean distance 

\x\=Jxl + -.. + xl 

is a quadratic form over the real numbers R (here, the 
Xi are the coordinates of x). This form is derived from 
the inner product 

< x,y ) = xiyi + ■ ■ ■ + x d y d 
by taking \x\ 2 to be ( x,x ). The inner product satisfies 
the relations 

(i) (x, x) ^ 0 for all x g R d , with equality if and only 
if x = 0. 

(ii) (x,y + z) = ( x,y ) + (x,z) for all x,y,z eR d . 

(iii) (A x,y) = <x,A y) = A (x,y) for all A e R and 
x,y e R d . 

(iv) (x,y) = (y,x) for all x,y e R d . 

More generally, any function </>(x,y) that satisfies 
these relations is called an inner product. The triangle 
inequahty is a consequence of arguably the most impor- 
tant inequality in mathematics, the cauchy-schwarz 

INEQUALITY [V.22] 

\{x,y)\ ^ |x| \y\. 

Not all quadratic forms on R d come from inner Prod- 
ucts, but they do all come from symmetric bilinear 
forms g : R d x — R. These are functions of two 
variables that satisfy all the axioms of an inner prod- 
uct except possibly (i), the positivity criterion. Given 
a quadratic form q(x) = g(x, x), one may recover g 
using the polarization identity 

g{x,y) = \{q(x + y)~ q(x) - q(y)). 

This correspondence between quadratic forms and 
symmetric bilinear forms works just as well when R 
is replaced by any held k, except that there are some 
serious technical issues when k has characteristic two 
(due to the presence of the fraction \ in the above 
formula). In linear algebra one often defines quadratic 
forms by first discussing symmetric bilinear forms. The 
advantage of this more abstract approach over the con- 
crete definition we gave at the beginning is that it is not 
necessary to specify a basis for R d . 

If one makes a good choice of basis, then the quad- 
ratic form can be made to look particularly pleasant: 
we may always choose a basis in such a way that 
q(x) = xf + ■ ■ ■ + x| — x| +1 xf 
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for some 5 and t satisfying 0 < s < t < d. Here 
xi,...,xt are the coefficients of x with respect to the 
basis we have carefully chosen. The quantity s - t is 
called the signature of the form. When s = d (as is 
the case for the form defining the Euclidean distance) 
the form is said to be positive definite. Forms that are 
not positive definite occur very commonly. For exam- 
ple, the form x 2 + y 2 + z 2 - t 2 is used to define 
minkowski space [1.3 § 6 . 8 ], which plays a key role in 
special relativity. 

We turn now to examples of quadratic forms in num- 
ber theory, beginning with two very famous theorems 
about quadratic forms over the integers Z. The first is 
theorem 2 , mentioned at the start of the article. It is 
due to fermat [VI.12]. There are many related results 
for other binary quadratic forms such as x 2 + 2 y 2 and 
x 2 + 3y 2 . In general, however, the question of which 
primes are represented by x 2 + ny 2 is extremely subtle 
and interesting, and leads one to class field theory 
[V.30]. 

In 1770 LAGRANGE [VI.22] showed that every number 
n can be written as a sum of four squares. In faet, the 
number of such representations of n, r 4 (n), is given by 
the formula 

r 4 (n) = X d. 

d\n 
4f d. 

This formula can be explained using the theory of mod- 
ular forms [III.61], one of the most important topics 
in number theory. Indeed, the generating series 

f(z) = f. n(n)e 2 ™ 2 

n= 0 

is a theta series, as a result of which it satisfies certain 
transformations that identify it as a modular form. 

A remarkable theorem of Conway and Schneeberger 
States that if a quadratic form ayx\ + < 12 x 2 + 0 . 3 X 3 + 
a 4 x 4 with 01 , . . . , o . 4 e N represents all the positive 
integers less than or equal to 15, then it represents 
all positive integers. ramanujan [VI.82] listed fifty-five 
such forms; actually, one of his forms did not repre- 
sent 15, but the remaining fif ty-f our forms constitute 
the complete list. For example, every positive integer 
can be written as x 2 + 2x| + 4x| + 13x|. 

Quadratic forms in three variables are more difficult 
to treat. gauss [VI.26] proved that n = x\ + xf + x\ 
if and only if n does not have the form 4 f (8 k + 7) for 
integers t and k. It is still not known exaetly which inte- 
gers can be written as x 2 + xf + 1 Ox| (this is known as 
Ramanujan’s ternary form). 


From the point of view of prime number theory, 
quadratic forms in one variable are the hårdest to 
understand. For example, are there infinitely many 
primes of the form x 2 + 1? 

Let us mention one final topic, where quadratic forms 
over R are studied but where the unknowns xi , . . . , x n 
are replaced by integers. In particular, let us mention 
a beautiful result of Margulis, which confirmed a con- 
jecture of Oppenheim. One instance of the result is the 
following: for any e > O, one may find integers Xi, X2, 
and X3 such that 

o<\xi+xyø.%$/n<e : . 

The proof uses techniques from ergodic theory 
[V.ll], which in related contexts are proving very influ- 
ential at the forefront of research today. No explicit 
bounds are known on how large Xi, X2, and X3 need 
to be. 

III.76 Quantum Computation 

A quantum computer is a theoretical device that makes 
use of the phenomenon of “superposition” in quantum 
mechanics to carry out certain computations in a way 
that is fundamentally different from any known classi- 
cal methods, and in a few important cases remarkably 
efficient. In classical physics, if there is some property 
that a particle could have, then either it has it or it does 
not. But according to quantum mechanics, it can exist 
in a sort of indeterminate State that is a linear com- 
bination of several States, in some of which it might 
have the property in question and in others not. The 
coefficients in this linear combination are called prob- 
ability amplitudes : the modulus squared of the coeffi- 
cient associated with a State tells you the probability 
of finding that the particle is in that State if you do a 
measurement. 

Exaetly what happens when you take a measurement 
is puzzling, and the subject of mueh debate among 
physicists and philosophers. Fortunately, however, one 
can understand quantum computation without solving 
the measurement problem, as it is called: indeed, one 
can get away with not understanding quantum mechan- 
ics at all. (Similarly, and for similar reasons, one could 
in principle do significant work in theoretical com- 
puter science without having the slightest idea what a 
transistor is or how it works.) 

To understand quantum computation it is helpful to 
look at two other models of computation. The notion of 
a classical computation is a mathematical distillation of 
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what actually goes on inside your computer. The “state” 
of a computer at any one time is modeled by an n-bit 
string: that is, a sequence of Os and ls of length n. Let 
us write cr for a typical string and tu , oy , . . . , <j n for the 
bits that make it up. A “computation” is a sequence of 
very simple operations performed on the initial string. 
For example, one operation might be to choose three 
numbers i, j, and k, all less than n, and change the kth 
bit ak of the current State cr to 1 if Oi = øj- = 1 and 
to 0 otherwise. What makes an operation such as this 
“simple” is that it is local in character: what it does to 
a depends on, and affects, just a bounded number of 
bits of cr (in this case it depends on two bits and affects 
one). The “state space” of a classical computer, in this 
model, is the set {0, l} n of all possible n-bit strings, 
which we shall denote by Q n . 

After a certain number of stages, we declare the com- 
putation to have finished. At this point we perform a 
simple sequence of “measurements” on the final state, 
which consist in looking at the bits of the string we have 
ended up with. If our problem is a “decision problem,” 
then we will typically organize the computation so that 
all we need to look at is a single bit: if it is 0 then the 
answer is no and if it is 1 then the answer is yes. 

If the ideas of the last two paragraphs are unfamiliar 
to you, then you are strongly advised to read the first 
few sections of computational complexity [IV.21] 
before continuing with this article. 

The next model we shall consider is probabilistic com- 
putation. This is just like classical computation except 
that at each stage we are allowed to toss a (possibly 
biased) coin and let the simple operation we perform 
depend on the outcome of the toss. For instance, we 
might again choose three numbers i, j, and k, but this 
time proceed as follows: with probability | we perform 
the operation described earlier, and with probability 
5 we change ak to 1 — cr fc . Remarkably, introducing 
randomness into algorithms can be extremely helpful. 
(Equally remarkably, there are strong theoretical rea- 
sons for believing that all algorithms that use random- 
ness can in faet be “derandomized.” See [IV.21 §7.1] for 
details.) 

Suppose that we allow our randomized probabilis- 
tic computation to run for k steps and that we do not 
examine the result. How should we model the current 
state of the computer? We could use exaetly the same 
definition as in the classical case— a state is an n-bit 
string— and simply say that the computation is in a 
state that we cannot know until we do a measurement. 
But the state of the computer is not a complete mystery: 


for each n-bit string a there will be some probability 
p (I that the state is a. In other words, it is better to 
think of the state of the computer as a probability 
distribution [III.73] on Q n . This probability distribu- 
tion will depend on the initial string, and therefore it 
can in principle give us useful information about that 

Here is how to use a randomized computation to 
solve a decision problem. Let us write P(a) for the 
probability that a certain bit ( without loss of general- 
ity the first) is 1 at the end of the computation, when 
the initial string is cr. Suppose we can arrange for P(a) 
to be at least a for all strings a for which the answer is 
yes, and at most some smaller number b for all strings 
a for which the answer is no. Let c be the average of 
a and b. Now run the computation m times for some 
large m. With very high probability, if the answer is yes 
then when we have finished the first bit will have been 
1 more than cm times, and if the answer is no then it 
will have been 1 fewer than cm times. So we can solve 
the decision problem, not with certainty, but at least 
with a negligibly small chance of error. 

The “state space” of a probabilistic computer con- 
sists of all possible probability distributions on Q n , 
or equivalently all possible funetions p : Q n ■— [0,1] 
such that Xo-eQ„ Per = 1- The state space of a quan- 
tum computer also consists of funetions defined on 
Q n , but there are two differences. First, they can take 
complex as well as real values. Second, if A : Q„ — C 
is a state, then the requirement on the size of A is 
that Zo-gq„ I A[ 2 = 1. In other words, A is a unit vec- 
tor in the hilbert space [III.37] ^(Qn.C) rather than 
a nonnegative unit vector in the banach space [III.64] 
^i(Qn>®)- The scalars are the probability ampli- 
tudes mentioned earlier. We shall explain what this 
means later. 

Among the possible States of a quantum computer 
are the “basis States,” which are the funetions that take 
the value 1 at one string and 0 everywhere else. It is 
customary to use Dirac’s “bra” and “ket” notation for 
these, writing | cr) if the string in question is a. Other 
“pure States” are then linear combinations of these, and 
Dirac’s notation is again used. For instance, if n = 5, 
then one fairly simple state that the computer could be 
in is \ip) = (1/V2)|01101> + (i/V2)|11001>. 

To get from one State to another, we again apply 
“local” operations, but adapted to the new, Hilbert 
space context. Suppose first that we have a basis state 
| cr) . Again we look at a very small number of bits. If, for 
instance, we look at three bits, at i, j, and k, then there 
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are eight possibilities for the triple t = (01,0^,03), 
which we could think of as the basis States in a much 
smaller State space: the space of all functions p : Q3 — 
C such that Ztsq 3 IPt I 2 = 1. The obvious operations 
that take unit vectors to unit vectors in a complex 
Hilbert space are the unitary maps [III. 52 §3.1], and 
these are indeed what are used. 

Let us illustrate this with an example. Suppose that 
n = 5, and that i, j, and k are 1, 2, and 4. One possi- 
ble operation on these three bits would send |000> to 
(1000) +i|lll»/V2 and |111> to (i|000> + |111))/V2, 
leaving all other three-bit sequences as they are. If 
our initial basis State is 101000), then the first, sec- 
ond, and fourth bits are in the State 1000), so the 
resulting State at the end of the operation would be 

(101000) +i|11110))/V2. 

Now that we have explained what a basic operation 
does to a basis State, we have in faet explained what 
it does in general, since the basis States form a basis 
of the State space. In other words, if you start with a 
linear combination (or superposition) of basis States, 
you apply the operation described above to each basis 
State and take the corresponding linear combination of 
the results. 

Thus, an elementary operation of quantum computa- 
tion consists in acting on the State space by means of 
a very special sort of unitary map. If the operation is 
on k bits (where k is typically very small indeed), then 
the matrix of this map will be a diagonal sum of 2 n ~ k 
copies of the 2 k x 2 k unitary matrix used to manipu- 
late those k bits (if the basis elements are appropri- 
ately ordered). A quantum computation is a sequence 
of these elementary operations. 

Measuring the result of a quantum computation is 
more mysterious. The basic idea is simple: we do a cer- 
tain number of elementary operations and then look at 
one of the bits of the resulting State. But what does this 
mean, when the State is not a basis State but rather a 
superposition of such States? The answer is that when 
we “measure” the rth bit of the output, we are doing a 
probabilistic process that is somewhat different from 
the measurement of a probabilistic computation: if the 
output State is Zo-gq„ A<rlcr), then the probability that 
we observe 1 is the sum of all | Ao- 1 2 such that the kth 
bit of a is 1, and the probability that we observe 0 is 
the same sum but over those a for which the kth bit 
is 0. This is why the numbers are called probabil- 
ity amplitudes. In order to get a useful answer from a 
quantum computation, one runs it several times, just 
as with a probabilistic computation. 


Note the following two important differences be- 
tween a quantum computation and a probabilistic com- 
putation. We described the State of a probabilistic com- 
putation as a probability distribution on Q n , which one 
could also call a convex combination of basis States. But 
this probability distribution is not telling us what is in 
the computer: that is a basis State. Rather, it is describ- 
ing our knowledge about what is in the computer. By 
contrast, the State of a quantum computer really is a 
unit vector in a 2” -dimensional Hilbert space. So in a 
certain sense a huge amount of computation can go 
on in parallel: this is what gives quantum computation 
its power. Although we cannot know much about the 
computation, since a single measurement causes it to 
“collapse,” we can hope to organize it so that different 
parts of it “interfere” with each other. This “interfer- 
ence” is related to the second main difference, which is 
the faet that we deal with probability amplitudes rather 
than probabilities. Roughly speaking, a quantum com- 
putation can “split up” and “reassemble itself,” whereas 
once a probabilistic computation splits up it stays split 
up. Crucial to the reassembly process in a quantum 
computation is cancelation of probability amplitudes: 
to give an extreme example, if you multiply a typi- 
cal unitary matrix by its inverse, then there is a huge 
amount of cancelation to get all the off-diagonal entries 
of the resulting matrix to be zero. 

All this raises two obvious questions: what are quan- 
tum computers good for, and can they actually be built? 
It turns out that a quantum computer can carry out 
classical and probabilistic computations, so the first 
question is asking whether they can do anything fur- 
ther. 1 One might think so, since the State space is so 
much bigger than it is for a classical computation (it is 
2 n dimensional rather than merely n dimensional), and 
the reassembly process means that we can potentially 
afford to visit remote parts of the State space, where all 
coefficients might be of very similar (and small) mag- 
nitudes, and come back again to a State where a useful 
measurement can be made. However, the very vastness 
of this space means that most States are completely 
inaccessible unless one is prepared to use a vast num- 
ber of basic operations. Additionally, it is important 
that at the end of the computation the output should 
not be a “typical” State, since only very special States 
give rise to useful measurements. 


1. It is also possible to simulate a quantum computation classically, 
but it would take an absurdly long time to do so: quantum computers 
cannot calculate noncomputable functions, but they may be far more 
efficient at calculating some computable ones. 
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These arguments show that if a quantum compu- 
tation is to be useful, then it will have to be very 
carefully (and cleverly) organized. However, there is a 
spectacular example of just such a computation: Peter 
Shor’s use of a quantum computer to calculate fast 
fourier transforms [III. 2 6] extremely rapidly. The 
fast Fourier transform has a symmetry that allows the 
calculation to be split up and carried out “in paral- 
lel” (it might be better to say “in superposition”) in 
a way that is ideally suited to a quantum computer. 
A super-fast Fourier transform can then be used to 
solve (by classical methods) some famous computa- 
tional problems, such as the discrete logarithm prob- 
lem and the factorization of large integers. The latter 
can then be used to break a public-key cryptosystem, 
the encryption method that lies at the heart of modern 
computer security. (See mathematics and cryptog- 
raphy [VII. 7 § 5] and computational number theory 
[IV.5 §3] for further discussion of these problems.) 

Can a machine be built that would actually be able 
to do this? There are formidable problems to over- 
come, arising from a phenomenon in quantum mechan- 
ics known as “decoherence,” which makes it very hard 
to stop a complicated State from “collapsing” to a sim- 
pler one that is no longer of use. Some progress has 
been made, but it is too early to say whether, or when, a 
quantum computer will be built that can factorize large 
numbers quickly. 

Nevertheless, the theoretical challenges raised by the 
notion of a quantum computer are fascinating. Perhaps 
the most interesting one is very simple: find an applica- 
tion of quantum computers that is significantly differ- 
ent from the few that have already been found. The faet 
that quantum computers can factorize large numbers 
is strong evidence that they are more powerful, but it 
would be good to have a better understanding of why. 
(It is known that quantum computers are better for 
some other uses, such as communication complex- 
ity [IV.21 §5.1.4].) Is there a mueh simpler task that 
is easy for quantum computers and difficult for clas- 
sical computers, at least if some well-known plausible 
hypothesis is true about what classical computers can- 
not do? Can quantum computers solve np-complete 
[IV.21 §4] problems? The majority opinion is that they 
cannot, and indeed the statement that they cannot is 
becoming another of the many “plausible hypotheses” 
of complexity theory, but it would be good to have 
stronger reasons for believing in this statement, such as 
a proof subject to already-known plausible hypotheses 
in classical computation. 


III. 7 7 Quantum Groups 

Shahn Majid 


There are at least three different paths that lead to the 
objects known today as quantum groups. They could 
be summarized briefly as quantum geometry, quantum 
symmetry, and self-duality. Any one of them would be 
a great reason to invent quantum groups and each of 
them had a role in the development of the modern 
theory. 

1 Quantum Geometry 

One of the great discoveries in physics in the twen- 
tieth century was that classical mechanics should 
be replaced by quantum mechanics, in which the 
space of possible positions and momenta of a par- 
ticle is replaced by the formulation of position and 
momentum as mutually noncommuting operators. This 
noncommutativity underlies Heisenberg’s “uncertainty 
principle,” but it also suggests the need for a more gen- 
eral notion of geometry in which coordinates need not 
commute. One approach to noncommutative geometry 
is discussed in operator algebras [IV.19 §5]. How- 
ever, another approach is to note that geometry really 
grew out of examples such as spheres, tori, and so 
forth, which are lie groups [III.50 §1] or objects closely 
related to Lie groups. If one wants to “quantize” geom- 
etry, one should first think about how to generalize 
basic examples like this: in other words, one should 
try to define “quantum Lie groups” and associated 
“quantum” homogeneous spaces. 

The first step is to consider geometrical structures 
not so mueh in terms of their points but in terms of 
corresponding algebras. For example, the group SL2 (C) 
is defined as the set of 2 x 2 matrices ( “ 5 ) of com- 
plex numbers such that <x5 - fiy = 1. We can think 
of this as a subset of C 4 , and indeed not just a subset 
but a VARiETY [III.97]. The natural class of funetions 
associated with this variety is the set of polynomials 
in fom variables (which are defined on C 4 ) restricted 
to the variety. However, if two polynomials take equal 
values on the variety, then we idenftfy them . In other 
words, we take the algebra of polynomials in four vari- 
ables a, b, c, and d and quotient [1.3 §3.3] by the 
ideal [III.83 §2] generated by all polynomials of the 
form ad - bc - 1. (This construction is discussed in 
detail in arithmetic geometry [IV.6].) Let us call the 
resulting algebra C[SL2]. 
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We can do the same for any subset X c c” that 
is defined by polynomial relations. This gives us a 
precise one-to-one correspondence between subsets of 
this type and certain commutative algebras equipped 
with n generators. Let us write C[X\ for the algebra 
that corresponds to X. As with many similar construc- 
tions (see, for example, the discussion of adjoint maps 
in duality [III.19]), a suitable map from X to Y gives 
rise to a map from C[T] to C[X]. More precisely, the 
map 4> from X to Y has to be polynomial (in a suitable 
sense) and the resulting map from C[T] to C[X] is an 
algebra homomorphism 4 >* that satisfies the formula 
4>*(p)(x) = p(4>x) for every x e X. 

Going back to our example, the set SL2 (C) has a group 
structure SL2(C) x SL2(C) — SL2<C) defined by the 
matrix product. The set SL2 (C) x SL2 (C) is a variety in 
C 8 and the matrix product depends in a polynomial way 
on the entries in the matrices, so we obtain an algebra 
homomorphism A : C[SL2] ->• C[SL2] 0<C[SL2], which is 
known as the coproduct. (The algebra C[SL2] 0 C[SL2] 
is isomorphic to C[SL2 x SL2].) It turns out that A can 
be expressed by the formula 



This formula needs a word or two of explanation: the 
variables a, b, c, and d are the four generators of the 
algebra of polynomials in four variables (and hence of 
its quotient by ad - bc - 1), and the right-hand side 
is a shorthand way of saying that Aa = a 0 a + b 0 c, 
and so on. Thus, A is defined on the generators by a 
sort of mixture of tensor Products [III.91] and matrix 
multiplication. 

One can then show that the associativity of matrix 
multiplication in SL2 is equivalent to the assertion 
that (A 0 id)A = (idøA)A. To understand what these 
expressions mean, bear in mind that A takes elements 
of C[SL 2 ] to elements of C[SL 2 ] 0 C[SL 2 ]. Thus, when 
we apply the map (A 0 id)A, for example, we begin 
by applying A, and thereby creating an element of 
C[SL2] 0 C[SL2]. This element will be a linear combi- 
nation of elements of the form p 0 g, each of which will 
then be replaced by A p 0 g. 

Similarly, one can express the rest of the group struc- 
ture of SL2(C) equivalently in terms of the algebra 
C[SL2]. There is a counit map e : C[SL2] — fc, which 
corresponds to the group identity, and an antipode map 
S : C[SL2] *+ C[SL2], which corresponds to the group 
inversion. The group axioms appear as equivalent prop- 
erties of these maps, making C[SL2] into a “Hopf alge- 


bra” or “quantum group.” The formal definition is as 
follows. 

Definition. A Hopf algebra over a held k is a quadruple 
(H,A,e,S), where 

(i) H is a unital algebra over fc; 

(ii) A : H — H ® H, e : H — k are algebra homo- 
morphisms such that (A 0 id)A = (idøA)A and 
(e 0 id) A = (id øe) A = id; 

(iii) S : H — H is a linear map such that m(id øS)A = 
m(Søid)A = le, where mis the product operation 
on H. 

There are two great things about this formulation. 
The first is that the notion of a Hopf algebra makes 
sense over any held. The second is that nowhere did 
we demand that H was commutative. Of course, if H is 
derived from a group, then it certainly is commutative 
(since multiplying two polynomials is commutative), so 
if we can find a noncommutative Hopf algebra, then we 
have obtained a strict generalization of the notion of 
a group. The great discovery of the past two decades 
is that there are indeed many natural noncommutative 
example s. 

For example, the quantum group C4LSL2 J is defined 
as the free associative noncommutative algebra on 
symbols a, b, c, and d modulo the relations 

ba = gab, bc = cb, ca = gac, dc = ged, 
db = gbd, da = ad + (g - q -1 )bc, ad-g~ 1 bc = 1. 
This forms a Hopf algebra with A given by the same 
formula as it is for C[SL2 ] and with suitable maps e and 
S. Here g is a nonzero element of C, and as g — 1 one 
obtains C[SL2]. This example generalizes to canonical 
examples C q [G] for all complex simple Lie groups G. 

Much of group theory and Lie group theory can be 
generalized to quantum groups. For example, Haar inte- 
gration is a linear map J : H — ■ k that is translation 
invariant in a certain sense that involves A. If it exists, 
it is unique up to a scalar multiple, and it does indeed 
exist in most cases of interest, including all finite- 
dimensional Hopf algebras, likewise, the notion of a 
complex of differential forms [III. 16] (fl,d) makes 
sense over any algebra H as a proxy for a differential 
structure. Here, Q = ©„ Q n is required to be an asso- 
ciative algebra generated by Q° = H and O 1 , but one 
does not assume that it is graded-commutative as in 
the classical case. When H is a Hopf algebra one can 
ask that Q is translation invariant, again in a certain 
sense that involves the coproduct A. In this case both Q 
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and its cohomology [IV. 10 §4] as a complex are super 
(or graded) quantum groups. The axioms of a (graded) 
Hopf algebra were originally introduced by Heinz Hopf 
in 1947 precisely to express the structure of the cohom- 
ology ring of a group, so this result brings us back 
full circle to the origins of the subject. For most quan- 
tum groups, including all the C q [G], one has a natural 
minimal complex (O, dj. Thus, a “quantum group” is 
not merely a Hopf algebra but has additional structure 
analogous to that of a Lie group. 

There are many other quantum groups that are not 
related to q-deformations. There are also applications 
of the theory to finite groups. If G is a finite group, one 
has a corresponding algebra k(G) of all functions on G 
with pointwise product and a coproduct (Af) (g, h) = 
f(gh) for / g k(G) and g,h g G. Here we identify 
k(G) ®k(G) and k(GxG), which makes Af into a func- 
tion of two variables, and one may check even more 
simply that this is a Hopf algebra. There can never be 
an interesting classical differential structure on a finite 
set, but if we use the methods developed for quantum 
groups, then we have one or more translation-invariant 
complexes (11 1 , d) on any finite group. Applying fur- 
ther parts of the theory of quantum group differen- 
tial geometry, one finds, for example, that the alternat- 
ing group A4 is naturally Ricci-flat, while the symmet- 
ric group S 3 naturally has constant curvature [III. 13], 
much like a 3-sphere. 

2 Quantum Symmetry 

Symmetry in mathematics is usually expressed as the 
action of a group or Lie algebra of finite or infinitesimal 
transformations of some structure. If you have a col- 
lection of transformations that is closed under inver- 
sion and composition, then you necessarily have an 
ordinary group. So how might one generalize this? The 
answer is that one begins by observing that a group G 
can act on several objects at the same time. If a group 
acts on two objects X and Y, then it also acts on their 
direct product X x Y, with g(x,y) = ( gx,gy ). Here 
we are making implicit use of a diagonal or “duplica- 
tion” map A : G — G x G, which duplicates a group 
element so that one copy can act on the first object 
and the other on the second object. In order to gener- 
alize this it once again pays to replace the notion of 
a group G by that of an algebra. This time we use the 
group algebra kG, which is the set of all formal linear 
combinations Xt ^-idu where the gi are elements of G 
and the A; are scalars from the held k. The elements 


of G (considered as particularly simple linear combina- 
tions of this kind) form a basis of kG and we multiply 
them as we would in G itself. One then extends this 
definition to products of more general linear combi- 
nations in the obvious way. We also extend A linearly 
from Ag = g ® g on the basis elements to a map from 
kG to kG ® kG. Together with some associated maps 
e and S, this makes kG into a Hopf algebra. Note that 
this is a completely different use of the coproduct from 
the one in the previous section, since the group prod- 
uct has already gone into the algebra. One has a similar 
story for the “enveloping algebra” U(ø) associated with 
any Lie algebra g; this is generated by a basis of ø with 
certain relations and becomes a Hopf algebra with the 
coproduct Ag=5®l+1®£ “sharing out” an element 
5 e ø for the purposes of acting on a tensor product of 
objects on which ø acts. 

Extrapolating from these two examples, a general 
“quantum symmetry” means an algebra H equipped 
with further structure A that allows one to form a ten- 
sor product V ® W of any two representations V, W of 
the algebra in an associative manner. An element h g H 
acts as h(v ®tu) = (A h)(v ®xc), where one part of Ah 
acts on v g V and another part on w g W. This is 
a second route to the Hopf algebra axioms we gave in 
the previous section. 

Note that, in the examples just given, A has had a 
symmetric output. As a consequence, if V and W are 
representations of a group or Lie algebra, then V ® W 
and W ® V are isomorphic via the obvious map that 
takes v ® w to w ® v . In general, however, V ® W and 
W ® V may be unrelated, so it is now the tensor product 
that is being made noncommutative. In nice examples it 
may be the case that V ® W = W ® V, but not necessarily 
by the obvious map. Instead, there may be a nontriv- 
ial isomorphism for every pair V, W, which may nev- 
ertheless obey some reasonable conditions. This hap- 
pens for a large class of examples, denoted by \J q (ø) 
and associated with all complex simple Lie algebras. 
For these examples, the isomorphism obeys the braid 
or Yang-Baxter relations among any three representa- 
tions (see braid groups [III.4]). As a result, these quan- 
tum groups lead to knot and 3-manifold invariants 
[III.46] (the Jones knot invariant comes from the exam- 
ple UqUfe), where s fe is the Lie algebra of the group 
SL2 (C)). The parameter g can usefully be regarded here 
as a formal variable, and these examples can be thought 
of as some kind of deformation of the classical envelop- 
ing algebras U(ø) . They arose originally in work of Drin- 
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feid and of Jimbo in the theory of quantum integrable 
systems. 


3 Self-duality 

A third point of view is that Hopf algebras are the next 
simplest category [III.8] after Abelian groups of struc- 
tures that admit a fourier transform [III.27]. It is not 
immediately obvious, but the axioms (i)-(iii) in the def- 
inition we gave earlier have a certain symmetry. One 
can write out the requirement (i) of a unital algebra H 
in terms of linear maps m : H ® H — k and rj : k ^ H 
(here q specifies the identity element of H as the image 
of 1 e k) that have to obey some straightforward com- 
mutative diagrams. If you reverse all the arrows in 
these diagrams, then you have the axioms displayed in 
(ii), obtaining what could be called a “coalgebra.” The 
requirement that the coalgebra structures A and e are 
algebra maps is given by a collection of diagrams that 
is invariant under arrow reversal. Finally, the axioms 
in (iii), as commutative diagrams, are invariant under 
arrow reversal in the above sense. 

Thus, the axioms of a Hopf algebra have the spe- 
cial property of being symmetric under arrow rever- 
sal. A practical consequence is that if H is a finite- 
dimensional Hopf algebra, then so is H* , with all 
structure maps defined as the adjoints of those of 
H (which necessarily reverses arrows). In the infinite- 
dimensional case one needs a suitable topological dual, 
or one can just speak of two Hopf algebras as dually 
paired to each other. For distance, C^fSL^J and U 
above are dually paired, while if G is finite then (kG) * = 
fc(G), the Hopf algebra of functions on G. 

As an application, let H be finite dimensional with 
basis {e a i, let H* have a dual basis {/“}, and let J 
denote a right-translation-invariant integral on H. The 
Fourier transform J : H — H* is defined as 

nh) = x(\e a h)f a 

and has many remarkable properties. A special case is 
a Fourier transform J :k(G) — ■ kG for any finite group 
G, which does not have to be Abelian. If G happens to 
be Abelian, then kG s fc(G), where G is the group of 
characters, and we recover the usual Fourier transform 
for finite Abelian groups. The point is that in the non- 
Abelian case, kG is not commutative and hence not the 
algebra of functions on any usual “Fourier dual” space. 

This point of view is responsible for the second 
main class of genuine quantum groups to have been 



Figure 1 Putting quantum groups in context. Self-dual 
categories are shown on the horizontal axis. 


discovered, namely the “bicrossproduct” ones of self- 
dual form. They are simultaneously “coordinate” and 
“symmetry” algebras, and are truly connected with 
quantum mechanics. An example, which is written 
C[IR 3 xi IR] a ►< U(so(l, 3)), is the so-called Poincaré 
quantum group ofa certain noncommutative spacetime 
with coordinates x, y, z, t, where t does not commute 
with the other variables. This quantum group can also 
be interpreted as the quantization of a particle moving 
in a curved geometry with black-hole-like features. In 
essence, the self-duality of quantum groups provides a 
paradigm for “toy models” of the unification of gravity 
(as spacetime geometry) and quantum theory. 

This is part of a wider picture indicated in figure 1 . 
A category of objects with a coherent notion of “tensor 
product” is called a monoidal (or tensor) category, and 
we have seen that this is the case for representations 
of quantum groups. There, one also has a “forgetful 
functor” to the category of vector spaces, which for- 
gets the quantum group action. This embeds quantum 
groups into the next most general self-dual category (in 
a representation-theoretic sense), namely that of func- 
tors between monoidal categories. Over on the right, 
I have included Boolean algebras as primitive struc- 
tures with (de Morgan) duality. However, the connec- 
tion between duality here and the other duahties is 
speculative. 

Further Reading 

Majid, S. 2002. A Quantum Groups Primer. London Math- 
ematical Society Lecture Notes, volume 292. Cambrldge: 
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Cambridge University Press. 


III. 78 Quaternions, Octonions, and 
Normed Division Algebras 


Mathematics took a leap forward in sophistication with 
the introduction of the complex numbers [1.3 §1.5]. To 
define these, one suspends one’s disbelief, introduces a 
new number i, and declares that i 2 = - 1. A typical com- 
plex number is of the form a + i b, and the arithmetic 
of complex numbers is easy to deduce from the nor- 
mal rules of arithmetic for real numbers. For example, 
to calculate the product of 1 + 2i and 2 + i one simply 
expands some brackets: 

(1 + 2i)(2 + i) = 2 + 5i + 2i 2 = 5i, 
the last equality following from the faet that i 2 = -1. 
One of the great advantages of the complex numbers 
is that, if complex roots are allowed, every polynomial 
can be factorized into linear factors: this is the famous 

FUNDAMENTAL THEOREM OF ALGEBRA [V.15]. 

Another way to define a complex number is to say 
that it is a pair of real numbers. That is, instead of writ- 
ing a + ib one writes simply (a, b). To add two complex 
numbers is simple, and exaetly what one does when 
adding two vectors: (a, b) + (c, d) = ( a + c,b + d ). How- 
ever, it is less obvious how to multiply: the product of 
(a, b) and (c,d) is ( ac - bd,ad + bc), which seems 
an odd definition unless one goes back to thinking of 
(a, b ) and (c, d) as a + i b and c + id. 

Nevertheless, the second definition draws our atten- 
tion to the faet that the complex numbers are formed 
out of the two-dimensional vector space [1.3 §2.3] R 2 
with a carefully chosen definition of multiplication. 
This immediately raises a question: could we do the 
same for higher-dimensional spaces? 

As it stands, this question is not wholly precise, since 
we have not been clear about what “the same” means. 
To make it precise, we must ask what properties this 
multiplication should have. So let us return to R 2 and 
think about why it would be a bad idea to define the 
product of ( a , b) and (c, d) in a simple-minded way as 
(ac, bd). Of course, part of the reason is that the prod- 
uct oia + ib and c + id is not ac + i bd, but why should 
we not also be interested in other ways of multiplying 
vectors in R 2 ? 

The trouble with this alternative definition is that 
it allows zero divisors, that is, pairs of nonzero num- 
bers that multiply together to give zero. For example, 


it gives us (1,0) (0, 1) = (0,0). If we have zero divi- 
sors, then we cannot have multiplicative inverses, since 
if every nonzero number in a number system has a mul- 
tiplicative inverse, and if xy = 0, then either x = 0 or 
y = x _1 xy = % _1 0 = 0. And if we do not have multi- 
plicative inverses, then we cannot define a useful notion 
of division. 

Let us return then to the usual definition of the com- 
plex numbers and try to think how we can go beyond it. 
One way we might try to “do the same” as we did before 
is to do to the complex numbers what we did to the real 
numbers. That is, why not define a “super-complex” 
number to be an ordered pair (z,w) of complex num- 
bers? Since we still want to have a vector space, we will 
continue to define the sum of (z,w) and (u,v) to be 
(z + u,w + v), but we need to think about the best way 
of defining their product. An obvious guess is to use 
precisely the expression that worked before, namely 
(zu-wv,zv + wu). But if we do that, then the product 
of (1, i) and (1, -i) works out tobe (1 +i 2 ,i-i) = (0,0), 
so we have zero divisors. 

This example came from the following thought. The 
modulus of a complex number z = a + ib, which mea- 
sures the length of the vector (a, b), is the real number 
| z | = -Ja 2 + b 2 . This can also be written as Vfz, where 
z is the complex conjugate a - ib of z. Now if a and 
b are allowed to take complex values, then there is no 
reason for a 2 + b 2 to be nonnegative, so we may not be 
able to take its square root. Moreover, if a 2 + b 2 = 0 
it does not follow that a = b = 0. The example above 
came from taking a = 1 and b = i and multiplying the 
number (1,1) by its “conjugate” (1,-i). 

There is, nevertheless, a natural way to define the 
modulus of a pair (z,w) that works even when z and w 
are complex numbers. The number |z| 2 + \w\ 2 is guar- 
anteed to be nonnegative, so we can take its square 
root. Moreover, if z = a + ib and w = c + id, then 
we will obtain the number (a 2 +b 2 + c 2 + d 2 ) 1/2 , which 
is the length of the vector (a, b, c, d). 

This observation leads to another: the complex con- 
jugate of a real number is the number itself, so, if we 
want to “use the same formula” for the complex num- 
bers as we used for the reals, we are free to introduce 
complex conjugates into that formula. Before we try to 
do that, let us think about what we might mean by the 
“conjugate” of a pair (z,w). We expect (z, 0) to behave 
like the complex number z, so its conjugate should be 
(z, 0). Similarly, if z and w are real, then the conjugate 
of (z,w) should be (z,-w). This leaves us with two 
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reasonable possibilities for a general pair ( z,te): either 
(z, -w) or (z, -tv). Letus consider the second of these. 

We would like the product of (z, tv ) and its conjugate, 
which we are defining as (z, -tv), tobe (\z\ 2 + \w\ 2 ,0). 
We want to achieve this by introducing complex conju- 
gates into the formula 


An obvious way of getting the result we want is to take 
( z,w)(u,v ) = (zu - wv,zv + tvu), 

and this modified formula, it turns out, defines an asso- 
ciative binary operation [1.2 §2.4] on the set of pairs 
(z,tv). If you try the other definition of conjugate, you 
will find that you end up with zero divisors. (A first 
indication of trouble is that, under the other definition, 
the pair (0, i) is its own conjugate.) 

We have just defined the quaternions, a set H of 
“numbers” that form a four-dimensional real vector 
space, or alternatively a two-dimensional complex vec- 
tor space. (The letter “H” is in honor of William Rowan 
Hamilton, their discoverer. See hamilton [VI. 3 7] for 
the story of how the discovery was made.) But why 
should we have wished to do that? This question 
becomes particularly pressing when we notice that the 
notion of multiplication that we have defined is not 
commutative. For example, (0, 1 ) (i, 0) = (0, i) , while 
(i, 0) (0, 1) = (0,-i). 

To answer it, let us take a step back and think about 
the complex numbers again. The most obvious justifi- 
cation for introducing those is that one can use them to 
solve all polynomial equations, but that is by no means 
the only justification. In particular, complex numbers 
have an important geometrical interpretation, as rota- 
tions and enlargements. This connection becomes par- 
ticularly clear if we choose yet another way of writing 
the complex number a + ih, as the matrix (%_%)■ Multi- 
plication by the complex number a + ib can be thought 
of as a linear map [1.3 §4.2] on the plane R 2 , and this is 
the matrix of that linear map. For example, the complex 
number i corresponds to the matrix ( ° ~q ) , which is the 
matrix of a counterclockwise rotation through \tt, and 
this rotation is exactly what multiplying by i does to 
the complex plane. 

If complex numbers can be thought of as linear maps 
from R 2 to R 2 , then quaternions should have an inter- 
pretation as linear maps from C 2 to C 2 . And indeed 
they do. Let us associate with the pair (z,w) the matrix 
( 3w % )■ Now let us consider the product of two such 


matrices: 



This is precisely the matrix associated with the pair 
(zu - wv,zv + tvu), which is the quaternionic prod- 
uct of (z,w) and (u, v)\ As an immediate corol- 
lary, we have a proof of a faet mentioned earlier: 
that quaternionic multiplication is associative. Why? 
Because matrix multiplication is associative. (And that 
is true because the composition of funetions is associa- 
tive: see [1.3 §3.2].) 

Notice that the determinant [III. 15] of the matrix 
( J w j ) is |z[ 2 + \tv\ 2 , so the modulus of the pair (z,w) 
(which is defined to be Vlzl 2 + Iwl 2 ) is just the determi- 
nant of the associated matrix. This proves that the mod- 
ulus of the product of two quaternions is the product of 
their moduli (since the determinant of a product is the 
product of determinants). Notice also that the adjoint 
of the matrix (that is, the complex conjugate of the 
transpose matrix) is ( £ ), which is the matrix asso- 

ciated with the conjugate pair (z, -te). Finally, notice 
that if | z | 2 + \tv\ 2 = 1, then 



which tells us that the matrix is unitary [III. 5 2 §3.1]. 
Conversely, any unitary 2x2 matrix with determinant 1 
can easily be shown to have the form (3 W '%). There- 
fore, the unit quaternions (that is, the quaternions 
of modulus 1) have a geometrical interpretation: they 
correspond to the “rotations” of C 2 (that is, the uni- 
tary maps of determinant 1), just as the unit complex 
numbers correspond to the rotations of R 2 . 

The group of unitary transformations of C 2 of deter- 
minant 1 is an important lie group [III.50 §1] called 
the special unitary group SU ( 2 ) . Another important Lie 
group is the group SO(3), of rotations of R 3 . Surpris- 
ingly, the unit quaternions can be used to describe this 
group as well. To see this, it is convenient to present 
the quaternions in another, more conventional, way. 

Quaternions, as they are usually introduced, are a 
system of numbers where - 1 has not just one square 
root but three, called i, j, and k. Once one knows that 
i 2 = j 2 = k 2 = -1, and also that ij = k, jk = i, and 
ki = j, one has all the information one needs to multiply 
two quaternions. For example, ji = jjk = -k. A typical 
quaternion takes the form a + ib + jc + k d, which corre- 
sponds to the pair of complex numbers (a+ic, b+id) in 
our previous way of thinking about quaternions. Now 
if we want, we can think of this quaternion as a pair 
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(a,v), where a is a real number and v is the vector 
( b,c,d ) in R 3 . The product of (a,v) and (b, w) then 
works out to be (ab - v ■ w, aw + bv + v a w), where 
v ■ w and v a w are the scalar and vector products of v 
and tv. 

If q = (a, u) is a quaternion of modulus 1, then a 2 + 
II w || 2 = 1, so we can write q in the form (cosd, v sind) 
with v a unit vector. This quaternion corresponds to 
a counterclockwise rotation R about an axis in direc- 
tion v through an angle of 20. This angle is not what 
one might at first expect, and neither is the way the 
correspondence works. If tv is another vector, we can 
represent it as the quaternion (0,w). We would now 
like a neat expression for the quaternion (0, Rtv)\ it 
turns out that (0,Rtv) = q(0,w)q*, where q* is the 
conjugate (cos 9, -v sin 0) of q, which is also its multi- 
plicative inverse, as q has modulus 1. So to do the rota- 
tion R, you do not mul tiply by q but rather you con- 
jugate by q. (This is a different meaning of the word 
“conjugate,” referring to multiplying on one side by 
q and on the other side by q _1 .) Now if q\ and q-> 
are quaternions corresponding to rotations Ri and Rz, 
respectively, then 

q 2 qi( 0 ,iwjqfqf = q2qi(0,iv)(q 2 qi)* , 
from which it follows that q 2 qi corresponds to the rota- 
tion R 2 R\. This tells us that quaternionic multiplication 
corresponds to composition of rotations. 

The unit quaternions form a group, as we have 
already seen— it is SU(2). It might appear that we have 
shown that SU(2) is the same as the group SO(3) of 
rotations of R 3 . However, we have not quite done this, 
because for each rotation of R 3 there are two unit 
quaternions that give rise to it. The reason is simple: 
a counterclockwise rotation through 9 about a vector 
v is the same as a counterclockwise rotation through 
-9 about —v. In other words, if q is a unit quaternion, 
then q and -q give rise to the same rotation of R 3 . So 
SU(2) is not isomorphic to SO(3); rather, it is a double 
cover of SO (3). This faet has important ramifications in 
mathematics and physics. In particular, it lies behind 
the notion of the “spin” of an elementary particle. 

Let us return to the question we were considering 
earlier: for which n is there a good way of multiplying 
vectors in R n ? We now know that we can do it for n = 1 , 
2, or 4. When n = 4 we had to sacrifice commutativity, 
but we were amply rewarded for this, since quaternion 
multiplication gives a very concise way of representing 
the important groups SU(2) and SO(3). These groups 


are not commutative, so it was essential to our suc- 
cess that quaternion multiplication should also not be 
commutative. 

One obvious thing we can do is continue the process 
that led to the quaternions. That is, we can consider 
pairs (q, r) of quaternions, and multiply these pairs by 
the formula 

(q,r)(s, t) = (qs - r*t,q*t + rs). 

Since the conjugate q* of a quaternion q is the ana- 
logue of the complex conjugate z of a complex number 
z, this is basically the same formula that we used for 
multiplication of pairs of complex numbers— that is, 
for quaternions. 

However, we need to be careful: multiplication of 
quaternions is not commutative, so there are in faet 
many formulas we could write down that would be 
“basically the same” as the earlier one. Why choose the 
above one, rather than, say, replacing q*t by tq *? 

It turns out that the formula suggested above leads 
to zero divisors. For example, (j, i) (1, k) works out to be 
(0, 0). However, the modified formula 

(q,r)(s, t) = (qs - tr*,q*t + sr), 
which one can discover fairly quickly if one bears in 
mind that one would like (q,r)(q* ,-r) to work out 
as (|q| 2 + |r| 2 ,0), does produce a useful number sys- 
tem. It is denoted O and its elements are called the 
octonions (or sometimes the Cayley numbers). Unfor- 
tunately, multiplication of octonions is not even asso- 
ciative, but it does have two very good properties: 
every nonzero octonion has a multiplicative inverse, 
and two nonzero octonions never multiply together to 
give zero. (Because octonion multiplication is not asso- 
ciative, these two properties are no longer obviously 
equivalent. However, any subalgebra of the octonions 
generated by two elements is associative, and this is 
enough to prove the equivalence.) 

So now we have number systems when n = 1, 2, 4, or 
8. It turns out that these are the only dimensions with 
good notions of multiplication. Of course, “good” has 
a technical meaning here: matrix multiplication, which 
is associative but gives zero divisors, is for many pur- 
poses “better” than octonion multiplication, which has 
no zero divisors but is not associative. So let us finish 
by seeing more precisely what it is that is special about 
dimensions 1,2,4, and 8. 

All the number systems constructed above have a 
notion of size given by a norm [III.64]. For real and com- 
plex numbers z, the norm of z is just its modulus. For 
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a quaternion or octonion x, it is defined to be Qx*x r 
where x* is the conjugate of x (a definition that works 
for real and complex numbers as well). If we write ||x|| 
for the norm of x, then the norms constructed have 
the property that \\xy || = ||x|| \\y\\ for every x and y. 
This property is extremely useful: for example, it tells 
us that the elements of norm 1 are closed under mul- 
tiphcation, a faet that we used many times when dis- 
cussing the geometric importance of complex numbers 
and quaternions. 

The feature that distinguishes dimensions 1, 2, 4, and 
8 from all other dimensions is that these are the only 
dimensions for which one can define a norm || ■ || and a 
notion of multiplication with the following properties. 

(i) There is a multiplicative identity: that is, a num- 
ber 1 such that lx = xl = x for every x. 

(ii) Multiplication is bilinear, meaning that x(y + z) = 
xy + xz, and x(ay) = a(xy) whenever a is a real 
number. 

(iii) For any x and y, \\xy\\ = ||x|| ||jd (and therefore 
there are no zero divisors). 

A normed division algebra is a vector space R n together 
with a norm and a method of multiplying vectors that 
satisfy the above properties. So normed division alge- 
bras exist only in dimensions 1, 2, 4, and 8. Further- 
more, even in these dimensions, R, C, H, and O are the 
only example s. 

There are various ways to prove this faet, which is 
known as Hurwitz’s theorem. Here is avery brief sketch 
of one of them. The idea is to prove that if a normed 
division algebra A contains one of the above examples, 
then either it is that example, or it contains the next one 
in the sequence. So either A is one of IR, C, H, and O or 
A contains the algebra produced by doing to O the pro- 
cess we used to construct H from € and O from H, a pro- 
cess known as the Cayley-Dickson construction. How- 
ever, if one applies the Cayley-Dickson construction to 
O, one obtains an algebra with zero divisors. 

To see how such an argument might work, let us 
imagine, for the sake of example, that A contains O 
as a proper subalgebra. It turns out that the norm on 
A must be a euclidean norm [III.37] — that is, a norm 
derived from an inner product. (Roughly speaking, this 
is because multiplication by an element of norm 1 does 
not change the norm, which gives A so many symme- 
tries that the norm on A has to be the most symmetric 
of all, namely Euclidean.) Let us call an element of A 
imaginary if it is orthogonal to the element 1. Then 


we can define a conjugation operation on A by tak- 
ing 1* to be 1 and x* to be -x when x is imaginary, 
and extending linearly. This operation can be shown 
to have all the properties one would like. In particular, 
aa* = a*a = \\a\\ 2 for every element a of A. Let us 
choose a norm-1 element of A that is orthogonal to all 
of O and call it i. Then i* = -i, so 1 = i* i = -i 2 , so 
i 2 = -1. Now take the algebra generated by i and the 
copy of O that lies in A. With some algebraic manip- 
ulation, one can demonstrate that this consists of ele- 
ments of the form x + iy, with x and y belonging to O. 
Moreover, the product of x + i y and z + iw turns out 
to be xz - wy*yMx*iu + zy), which is exaetly what 
the Cayley-Dickson construction gives. 

For further details about quaternions and octo- 
nions, there are two excellent sources: a discus- 
sion by John Baez at http://math.ucr.edu/home/baez/ 
octonions and a book, On Quaternions and Octonions: 
Their Geometry, Arithmetic, and Symmetry, by J. H. 
Conway and D. A. Smith (2003; Wellesley, MA: AK 
Peters). 
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A linear representation of a finite group [1.3 §2.1] G is 
a way of associating a linear map T g , from some vec- 
tor space [1.3 §2.3] V to itself, with each element g of 
G. Of course, this association must reflect the group 
structure of G, so T g T h should equal T g h, and if e is 
the identity of G, then T e should be the identity map 
on V. 

One useful aspect of linear representations is that 
the dimension of the vector space V may be consider- 
ably smaller than the size of G. If this is the case, then 
the representation packages the information about G 
in a particularly efficient way. For example, the alter- 
nating group [III. 70] A 5 , which has sixty elements, is 
isomorphic to the group of rotational symmetries of 
an icosahedron, and can therefore be thought of as 
a group of transformations of IR 3 (or, equivalently, of 
3x3 matrices). 

A more fundamental reason for representations 
being useful is that every representation can be decom- 
posed into budding blocks known as irreducible repre- 
sentations. It turns out that a great deal of information 
about G can be deduced from a few basic facts about 
its irreducible representations. 

These ideas can be generalized to infinite groups 
as wed, and are particularly important in the case of 
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lie groups [III. 50 § 1 ] . Since Lie groups have a differ- 
entiable structure, the representations of interest are 
those where the homomorphism g •- T g reflects this 
structure (for example, by being differentiable). 

Representations are discussed in much greater detail 
in REPRESENTATION THEORY [IV. 1 2] . See alSO OPERATOR 
ALGEBRAS [IV.19 §2]. 


III.80 Ricci Flow 

Terence Tao 


Ricci flow is a technique that allows one to take an arbi- 
trary riemannian manifold [1.3 §6.10] and smooth 
out the geometry of that manifold to make it look more 
symmetric. It has proven to be a very useful tool in 
understanding the topology of such manifolds. 

Ricci flow can be defined for Riemannian manifolds 
of any dimension, but for the sake of exposition we 
restrict ourselves here to two-dimensional manifolds 
(i.e., surfaces) as they are easy to visualize. From our 
everyday experience with three-dimensional space R 3 , 
we are familiar with many surfaces, such as spheres, 
cylinders, planes, tori (the shape of the surface of a 
doughnut), and so forth. This is an extrinsic way to 
think about surfaces: as subsets of a larger ambient 
space, which in this case is three-dimensional Euclid- 
ean space. On the other hånd, one can think about 
surfaces in a more abstract intrinsic manner: by con- 
sidering how the points in the surface stand in rela- 
tion to each other, but not in relation to any exter- 
nal space. (For instance, the Klein bottle makes perfect 
sense as a surface from an intrinsic viewpoint, but can- 
not be viewed extrinsically in three-dimensional Euclid- 
ean space R 3 , although it can be viewed extrinsically 
in four-dimensional Euclidean space R 4 .) It turns out 
that the two viewpoints are mostly equivalent to each 
other, but it will be more convenient here to adopt the 
intrinsic perspective. 

A good example of a surface is the surface of Earth. 
Extrinsically, this is a subset of a three-dimensional 
space R 3 . But we can also view this surface two dimen- 
sionally by using an atlas: a collection of maps or charts 
that describe various regions of this surface by identi- 
fying them with a subset of a two-dimensional plane. 
As long as we have enough charts to cover the origi- 
nal surface, this atlas is sufficient to describe the sur- 
face. This way of thinking of a surface is not completely 
intrinsic, because there is more than one atlas that one 
could associate with this surface, and they may differ in 
various rumor ways. For instance, in one atlas the city 


of Los Angeles might be on the boundary of one of the 
charts, whereas in another atlas it might be in the inte- 
riør of every chart that it appears in. However, there are 
many facts one can deduce from an atlas that do not 
depend on the choice of atlas; for instance, using any 
accurate atlas of Earth one can see that it is impossible 
to travel from Los Angeles to Sydney without Crossing 
at least one ocean. If a faet regarding a surface does not 
depend on which atlas one uses, we say that it is intrin- 
sic or coordinate-independent. It will turn out that Ricci 
flow is an intrinsic flow on surfaces; it can be defined 
without any knowledge of charts or of some external 
space. 

We have informally described the mathematical con- 
cept of a surface, or two-dimensional manifold. But to 
describe Ricci flow we need the more sophisticated con- 
cept of a Riemannian surface (or two-dimensional Rie- 
mannian manifold). This is a surface M with an addi- 
tional (intrinsic) object, a Riemannian metric g, which 
specifies the distance d(x,y) between any two points 
x, y on the surface. This metric allows one to define 
the angle Zyi, y2 that any two curves yi, Y2 on the sur- 
face make where they intersect; for instance, the Earth’s 
equator intersects any longitude at right angles. And it 
can also be used to define the area | A\ of any given set 
A on the surface (e.g., the area of Australia). There are a 
number of properties that these concepts of distance, 
angle, and area have to satisfy, but the most important 
property can be stated informally as follows: the geom- 
etry ofa Riemannian surface has to be very close to the 
geometry of the Euclidean plane at small length scales. 

To give an example of what the above statement 
means, take any point x in the surface M, and pick 
any positive radius r. Because the Riemannian metric 
g specifies a notion of distance, we can define the disk 
B(x,r) of radius r centered at x to be the set of all 
points y whose distance d(x,y) to x is less than r. 
Because the Riemannian metric g defines a notion of 
area, we can then discuss the area of this disk B(x, r). 
In the Euclidean plane, this area would of course be 
nr 2 . In a Riemannian surface, this need not be the case: 
for instance, the total area of the surface of Earth (and 
hence of all disks within this surface) is flnite, even 
though nr 2 canbe arbitrarily large as r goes to infinity. 
However, we do require that, when r is very small, the 
area of the disk B{x,r) becomes increasingly close to 
nr 2 \ more precisely, we require that the ratio between 
the area and nr 2 converges to 1 in the limit as r tends 
to 0. 
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This brings us to the notion of scalar curvature R(x) . 
In some cases, such as on the sphere, the area \B(x, r) \ 
of a small disk B(x,r) is actually a little bit smaller 
than nr 2 ; when this is the case, we say that the sur- 
face has positive scalar curvature at x. In some other 
cases, such as on a saddle, the area \B(x,r)\ of a small 
disk B(x,r ) is a bit larger than rrr 2 ; then we say that 
the surface has negative scalar curvature at x. In other 
cases again, such as on a cylinder, the area \B(x, r) 
of a small disk B(x,r) is equal (or very nearly equal) 
to nr 2 - in this case we say the surface has vanishing 
scalar curvature at x. (This is despite the cylinder being 
“curved” when viewed extrinsically as a subset of three- 
dimensional space.) Note that on a complicated surface 
it is perfectly possible to have positive scalar curvature 
at some points of the surface and negative or vanish- 
ing scalar curvature at other points. The scalar curva- 
ture R(x) at any given point x can be defined more 
precisely by the formula 


R(x) = 


2 -\B(x,r)\ 
nr V 24 ' 

(For surfaces in an external space, this intrinsic concept 
of scalar curvature is almost identical to the extrinsic 
concept of Gauss curvature, which we will not discuss 
here.) 

One can refine this notion to that of Ricci cur- 
vature Ric(x)(u,u). Consider now an angular sector 
A(x,r,0,v) inside a small disk B(x,r) of small angu- 
lar aperture 0 (measured in radians) about some direc- 
tion v (a unit vector) emanating from x. This sector is 
well-defined, basically because the Riemannian metric 
gives us the appropriate notions of distance and angle. 
In Euclidean space, the area \A(x, r, 6,v)\ of this sec- 
tor is ^cor 2 . But on a surface, the area \A(x,r,0,v)\ 
might be slightly less (respectively, slightly more) than 
\ Ør 2 . In these cases we say that the surface has posi- 
tive (respectively, negative) Ricci curvature at x in the 
direction v . More precisely, we have 


hi or 2 - \A(x,r,a>,v)\ 
Ric(x)(u, v) = lim lim ^ ’ 
r-oæ-o cor 4 /24 

Now it turns out that for surfaces, this more com- 
plicated notion of curvature is in faet equal to half 
the scalar curvature: Ric(x)(v, v) j£.|R(x). In partic- 
ular, the direction v plays no role in Ricci curvature 
in two dimensions. However, it is possible to extend 
all of the above concepts to other dimensions. (For 
instance, to define scalar and Ricci curvature for three- 
dimensional manifolds, one would use balis and solid 
sectors instead of disks and angular sectors, as well as 


making other necessary adjustments, such as replacing 
the expression nr 2 with | nr 3 .) In higher dimensions it 
turns out that the Ricci curvature is more complicated 
than the scalar curvature. For instance, in three dimen- 
sions it is possible for a point x to have positive Ricci 
curvature in one direction but negative Ricci curvature 
in another; intuitively, this means that narrow sectors 
in the former direction “curve inward,” whereas narrow 
sectors in the latter direction “curve outward.” 

Now we can describe Ricci flow informally as the pro- 
cess of stretching the metric g in directions of negative 
Ricci curvature, and contracting the metric in directions 
of positive Ricci curvature. The stronger the curvature, 
the faster the stretching or contracting of the metric. 
The concepts of stretching and contracting will not be 
defined formally here, but they inerease or decrease 
the distance between points along these directions. By 
changing the notion of distance, one also affeets the 
notions of angle and volume (though it turns out that 
Ricci flow in two dimensions is conf ormal, which means 
that the notion of angle remains unaffeeted by the 
flow; this faet is closely related to the previously men- 
tioned faet that in two dimensions the Ricci curvature 
is the same in all directions). Ricci flow canbe described 
succinctly and precisely by the equation 



although we will not define here exaetly what it means 
to differentiate the metric g with respect to the time 
variable t, or what it means for that derivative to equal 
the Ricci curvature multiplied by -2. 

In principle, one could perform Ricci flow on a mani- 
fold for as long a period of time as one wished. In prac- 
tice, however, it is possible (especially in the presence 
of positive curvature) for the Ricci flow to cause a man- 
ifold to develop singularities: points where it ceases 
to look like a manifold, and where the geometry may 
stop resembling Euclidean geometry even at very small 
scales. For example, if one starts with a perfectly round 
sphere and performs Ricci flow, what happens is that 
the sphere contracts at a steady rate until it becomes 
a point, which is no longer a two-dimensional mani- 
fold. In three dimensions, more complicated singular- 
ities are possible: for instance, one can have a neck 
pinch, in which a cylinder-like “neck” of the manifold 
shrinks under Ricci flow, until at one or more places 
along the neck, the cylinder has tapered down to a 
point. The types of possible singularity formations for 
three-dimensional Ricci flow were only classified com- 
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pletely in a recent and very important paper of Grigori 
Perelman. 

Some years ago, Richard Hamilton made the funda- 
mental observation that Ricci flow is an excellent tool 
for simplifying the structure of a manifold: generally 
speaking, it compresses all the positive-curvature parts 
of the manifold into nothingness, while expanding the 
negative-curvature parts of the manifold until they 
become very homogeneous, in the sense that the man- 
ifold begins to look much the same no matter which 
vantage point one selects inside it. Indeed, the flow 
seems to separate the manifold into extremely symmet- 
ric components. For instance, in two dimensions the 
Ricci flow always ends up endowing the manifold with 
a metric of constant curvature, which could be positive 
(as in the sphere), zero (as in the cylinder), or negative 
(as in hyperbolic space)-, the faet that such a constant- 
curvature metric can always be found is known as 
the uniformization theorem [V.37] and is of funda- 
mental importance in the theory of surfaces. In higher 
dimensions, the Ricci flow can develop singularities 
before perfeet symmetry is attained, but it turns out 
that it is possible to perform “surgeries” (see differ- 
ential topology [IV.9 §§2.3, 2.4]) on the singularities 
that develop this way, so that the manifold becomes 
smooth again and one can restart the Ricci flow pro- 
cess. (The surgery may, however, change the topology 
of the manifold: for instance, it can convert a connected 
manifold into two disconnected pieces.) In three dimen- 
sions it has recently been shown by Perelman that Ricci 
flow, when augmented by surgery to remove the sin- 
gularities, does indeed convert an arbitrary manifold 
(obeying some mild assumptions) into a finite union of 
some very symmetric and explicitly describable pieces; 
the precise statement of this conclusion was known as 
the geometrization conjecture of Thurston. One conse- 
quence of this conjecture, which is now a rigorous theo- 
rem proved by Perelman, is the poincaré conjecture 
[V.28]: any compact three-dimensional manifold that is 
simply connected (meaning that any closed loop on the 
manifold can be contracted smoothly to a point with- 
out ever leaving the manifold) can in faet be smoothly 
deformed into a 3-sphere (which is to four-dimensional 
Euclidean space as the usual two-dimensional sphere 
is to three-dimensional Euclidean space). The proof of 
Poincaré’s conjecture is one of the most impressive 
recent achievements of modern mathematics. 


III.81 Riemann Surfaces 

Alan F. Beardon 


Let D be a region (that is, a connected open set) in 
the complex plane. If / is a complex-valued funetion 
defined on D, then we can define its derivative just as 
we would for real-valued funetions defined on subsets 
of R: the derivative of / at w is the limit as z tends to 
w of the “difference quotient” (f(z) - f(w))/(z - w). 
Of course, this limit does not necessarily exist, but if it 
exists for every w in D, then / is said to be analytic, 
or holomorphic, on D. Analytic funetions have amazing 
properties; for example, if a funetion is analytic in a 
region, then it automatically has a Taylor-series expan- 
sion at each point of the region, from which one can 
deduce that it is infinitely differentiable. This is in stark 
contrast to the theory of real funetions of a real vari- 
able, where, for example, a funetion may be once dif- 
ferentiable but not twice differentiable at some point 
x, yet three-times differentiable at some other point y. 
Complex analysis is the study of analytic funetions. Per- 
haps more than any other mathematical topic, it is both 
immensely useful in a practical sense and profound 
and beautiful in a theoretical sense. (See also some 
FUNDAMENTAL MATHEMATICAL DEFINITIONS [1.3 §5.6].) 

In general, group theorists do not distinguish be- 
tween isomorphic groups, and topologists do not 
distinguish between homeomorphic spaces. Similarly, 
complex analysts do not distinguish between two 
regions D and D' if there is an analytic bijection 
between D and D'. When this is the case, we say that 
D and D' are conformally equivalent. Conformal equiv- 
alence is, as its name suggests, an equivalence rela- 
tion [1.2 §2.3]: the proof depends on the surprising faet 
that if / is an analytic bijection from D to D', then 
its inverse / -1 : D’ — D is also analytic. Again, this 
contrasts with real analysis. If D and D ' are confor- 
mally equivalent, then “interesting” properties of ana- 
lytic funetions on D are transferred automatically to 
corresponding properties of analytic funetions defined 
on D'. Indeed, this statement can almost be taken as a 
definition of “interesting” properties (although admit- 
tedly this conflicts with the numerical side of com- 
plex analysis, because purely numerical statements do 
not usually transfer under such maps). Naturally, we 
would like to know which properties of analytic fune- 
tions are “interesting” in this sense. One such prop- 
erty is that (except at certain isolated points) the angle 
between two intersecting curves in D is preserved 
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under an analytic map: this is the origin of the term 
“conformal.” It is less well-known that if a bijection 
(which is not assumed to be differentiable) preserves 
the angles between curves (that is, both their magni- 
tude and whether they are measured clockwise or coun- 
terclockwise), then it is analytic. Thus, loosely speak- 
ing, the preservation of angles implies the existence of 
a Taylor series! 

The impact of complex analysis on other topics is 
so great that it is natural to try to find the most gen- 
eral type of surface on which we can study analytic 
functions. This leads to the definition of a Riemann 
surface (after Bernhard riemann [VI.49], who intro- 
duced the idea in his doctoral dissertation). In order 
to put a coordinate system on a surface S we try to 
map S bijectively onto a plane region D; if we succeed, 
then we can transfer the coordinates from D to S. For 
many surfaces (for example, a sphere) it is not possible 
to find such a map, and we have to be satisfied with 
local coordinates. This means that at each point w of 
S, we map a neighborhood N of w onto a plane region, 
and so obtain coordinates that are restricted to N. As 
there are usually infinitely many ways to do this, we are 
forced to consider the class of transition maps ; that is, 
the maps from one coordinate system at w to another. 
The surface is a Riemann surface precisely when each 
such transition map is an analytic bijection. This defi- 
nition resembles that of a two-dimensional manifold 
[1.3 §6.9], but the requirement that the transition maps 
should be analytic is much stronger, so by no means 
every 2-manifold is a Riemann surface. 

It is not difficult to construct Riemann surfaces. Con- 
sider, for example, a sphere S resting on a horizontal 
table. If we imagine a light source at the highest point 
P of the sphere, then each point of S except P casts a 
“shadow” on the table: since the table has a simple coor- 
dinate system, we can use these “shadows” to define a 
coordinate system on all of S except the point P. Sim- 
ilarly, a light source at the point Q of tangency with 
the table casts a shadow onto the (horizontal) tangent 
plane at P, and this gives a coordinate system valid 
throughout S except at Q. It can be shown that if the 
second coordinate system is composed with a reflec- 
tion, then the sphere does have the structure of a Rie- 
mann surface. This is an extremely important exam- 
ple, because it allows one to handle questions involving 
infinity in a satisfactory way; it is known as the Riemann 
sphere. 

For another example, consider a cube C, and (for sim- 
plicity only) remove the eight vertices. Given a face F of 


C (without its bounding edges), we can find a Euclidean 
rigid motion that maps F into C, so we can easily define 
a coordinate system on F. If te is an interior point of 
an edge E of C, we can “open” the two faces that meet 
at E to make a planar region that contains E, and then 
map this region into C by a Euclidean rigid motion. In 
this way we see that C (less its vertices) is a Riemann 
surface. The problem with the vertices can be solved by 
technical means, and this method can then be general- 
ized to show that any polyhedron (even one with holes, 
such as a “square” torus) is a Riemann surface. These 
are known as compact surfaces. It is a deep but fascinat- 
ing classical result that each such surface corresponds 
bijectively to an irreducible polynomial P(z,w) in two 
complex variables. To give an idea of how the corre- 
spondence works, let us consider an equation such as 
w 3 + wz + z 2 =0. For each z this can be solved to give 
three values of w, say wi, uq, and w 3; as we allow z 
to vary in C, the values Wj vary, and as they do so they 
create a Riemann surface W, which can be shown to 
be connected. This surface can be thought of as lying 
“above” C, and for all but a finite set of z in C there are 
exactly three points on W that are “above” z. 

As we have mentioned, Riemann surfaces are impor- 
tant because they are the most general surfaces on 
which one can study analytic functions, with all of their 
remarkable properties. It is easy to define what we 
meanby an analytic function / on a Riemann surface R. 
Given a coordinate system on part of R, we can think of 
/ as a function of the coordinates, and we then regard 
/ as analytic if and only if it depends analytically on the 
coordinates. Because the transition maps are analytic, 
/ will be analytic with respect to one coordinate system 
if and only if it is analytic with respect to all the other 
coordinate systems defined at the point in question. 

This simple property— that if something holds in one 
coordinate system, then it holds in all of them— is one 
of the crucial features of the theory. For example, sup- 
pose that we have two curves Crossing on an (abstract) 
Riemann surface. If we transfer the two curves to plane 
regions using different local coordinate systems at the 
Crossing point, and then measure the angle of intersec- 
tion in each case, we must get the same result (since 
the transition from one coordinate system to another 
preserves angles). It follows that the angle between 
intersecting curves on an abstract Riemann surface is 
a well-defined concept. 

It turns out that analysis on Riemann surfaces goes 
beyond analytic functions. Flarmonic functions (solu- 
tions of laplace’s equation [1.3 §5.4]) are intimately 
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connected to analytic functions, since the real part of an 
analytic function is harmonic and any harmonic func- 
tion is (locally) the real part of an analytic function. 
Thus, on a Riemann surface, complex analysis merges 
almost imperceptibly with potential theory (which is 
the study of harmonic functions). 

Perhaps the most profound theorem of all about Rie- 
mann surfaces is the uniformization theorem [V.37]. 
Roughly speaking, this says that every Riemann surface 
is obtained from either Euclidean, spherical, or hyper- 
bolic geometry (see some fundamental mathemati- 
cal definitions [1.3 §§6.2, 6.5, 6.6]) by taking a poly- 
gon in that geometry and gluing its sides together, in 
the same way that one obtains a torus by gluing oppo- 
site sides of a rectangle together. (See also fuchsian 
groups [111.28].) Remarkably, only very few Riemann 
surfaces come from the Euclidean or spherical geome- 
tries; essentially, every Riemann surface can be con- 
structed in this way from (and only from) the hyper- 
bolic plane. This means that virtually every region in 
the complex plane comes equipped with a natural and 
intrinsic geometry whose character is hyperbolic and 
not, as one might expect, Euclidean. The Euclidean char- 
acter of a generic plane region comes from its embed- 
ding in C, and not from its own intrinsic hyperbolic 
geometry. 


III.82 The Riemann Zeta Function 


The Riemann zeta function t, is a function defined on 
the complex numbers that encapsulates in a remark- 
able way many of the most important properties about 
the distribution of prime numbers. If s is a complex 
number with real part greater than 1, then £(s) is 
defined to be Zn=i n ~ s ■ The condition that Re(s) > 1 
is needed to ensure that this series converges. How- 
ever, because the resulting function is holomorphic 
[1.3 §5.6], it is possible to extend the definition by 
means of analytic continuation. The result is a func- 
tion that is defined everywhere on the complex plane 
(though it takes the value °o at 1). 

A first clue to why this function is related to the 
distribution of primes is Euler’s product formula: 

c(5) = n(i-p“ i ) _i - 

Here, the product on the right-hand side is over all 
primes. The formula can be proved by writing (1 - 
p -5 ) -1 as 1 + p~ s + p~ 2s + ■ ■ ■ , expanding out the prod- 
uct, and using the fundamental theorem of arith- 
metic [V.16]. Deeper connections were discovered by 


riemann [VI.49], who formulated the famous riemann 
HYPOTHESIS [IV.4 §3]. 

The Riemann zeta function is just one of a family 
of functions that encode important number-theoretic 
information. For example, Dirichlet L-functions are 
closely related to the distribution of primes in arith- 
metic progressions. For more details about these and 
about the Riemann zeta function itself, see analytic 
number theory [IV.4]. Some more sophisticated zeta 
functions are described in the weil conjectures 
[V.38]. See also L-functions [III.49]. 


III.83 Rings, Ideals, and Modules 


1 Rings 

A ring, like a group [1.3 §2.1] or a field [1.3 §2.2], is 
an algebraic structure that satisfies certain axioms. To 
remember the axioms for both rings and fields at the 
same time, it is helpful to think of two simple examples: 
with the two operations of addition and multiplication, 
the set Z of all integers forms a ring and the set O of 
all rational numbers forms a field. In general, a ring is a 
set R with two binary operations [1.2 §2.4], denoted 
by “+” and “x”, which satisfies all the field axioms apart 
from the one that says that nonzero elements have 
multiplicative inverses. 

Although the integers are the prototypical example 
of a ring, the notion arose historically as an abstraction 
from several sources, one of which was polynomials. 
Like integers, polynomials (with real coefficients, say) 
canbe added andmultiplied, and these operations have 
all the properties one might expect, such as the faet 
that multiplication is distributive over addition, so the 
space of such polynomials forms a ring. Other exam- 
ples include the integers modulo n (for any positive 
integer n), the rationals (or indeed any other field), and 
the set Z[i] of all complex numbers a + bi such that a 
and b are integers. 

Sometimes the assumptions that multiplication is 
commutative and has an identity element are dropped. 
This leads to a more complicated theory, but it does 
encompass important examples such as the set of all 
n x n matrices (with elements in a given field, or even 
just a ring). 

As with other algebraic structures, there are several 
ways of forming new rings from old ones: for instance, 
we can take subrings and direct products of two rings. 
Slightly less obviously, we can start with a ring R and 
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form the ring of all polynomials with coefficients in R. 
We can also take quotients [1.3 §3.3], but in order to 
discuss these we must introduce the notion of an ideal. 

2 Ideals 

A typical quotient construction for an algebraic struc- 
ture A will identify some substructure B and regard two 
elements of A as “equivalent” if they “differ by an ele- 
ment of B." If A is a group or a vector space [1.3 §2.3], 
then B will be a subgroup or a subspace. However, the 
situation for rings is slightly different. 

We can see why if we think about quotients in another 
way: as images of homomorphisms [1.3 §4.1]. The sub- 
structures that we like to quotient by are the kerneis 
of these homomorphisms, so we should ask ourselves 
what the kernel of a ring homomorphism (that is, the 
set of elements that map to 0) will be like. 

If <fi : R — R' is a homomorphism between two rings, 
and <t>(a) = 4>(b) = 0, then cf>(a + b) = 0. Also, if 
r is any element of R, then </>(ra) = 4>(r)4>(a ) = 0. 
Thus, the kernel of a homomorphism is closed under 
addition, and also under multiplication by any element 
of the ring. These two properties define the notion of an 
ideal. For example, the set of all even integers is an ideal 
in Z. In interesting cases, ideals are not subrings, since 
if an ideal contains 1 then it must contain r for every 
r in the ring. (An example that makes the difference 
very clear is the subset of the ring of all polynomials 
that consists of all constant polynomials. The constants 
form a subring, but they certainly do not form an ideal.) 

It is not hard to show that for any ideal I in a ring 
R there is a homomorphism that has I as its kernel, 
namely the quotient map from R to the quotient R/I. 
Here R/I is a construction that as usual we think of as 
“R, but with two elements considered the same if they 
differ by an element of 

Quotients of rings are extremely useful in alge- 
braic number theory [IV.3] because they allow us to 
rephrase questions about algebraic numbers as ques- 
tions about polynomials. To get an idea of how this is 
done, consider the ring Z[X] of all polynomials with 
integer coefficients, and the ideal that consists of all 
multiples (by integer polynomials) of the polynomial 
X 2 + 1. In the quotient of Z[X\ by this ideal, we regard 
two polynomials as the same if they differ by a multiple 
of X 2 + 1. In particular, X 2 is the same as -1. In other 
words, in this quotient ring we have a square root of 
- 1 , and in faet the quotient ring is isomorphic to the 
ring Z[i] that we met earlier. 


One of the things we like to do to integers is fac- 
torize them, and we can try to do the same in rings 
as well. However, it turns out that, while it is usually 
possible to factorize an element of a ring into “irre- 
ducible” ones that cannot be factorized further (like 
the primes in Z), in many cases the factorization is not 
unique. At first, this might be rather unexpected, and 
indeed it was a stumbling block for many early workers 
(in the eighteenth and nineteenth centuries). Here is an 
example: in the ring which consists of all com- 

plex numbers a + b4~ 3, where a and b are integers, 
the number 4 may be factorized as 2 x 2 and also as 
( 1 i- 3 ) x (1 0 3). 

3 Modules 

Modules are to rings as vector spaces are to helds. In 
other words, they are algebraic structures where the 
basic operations are addition and scalar multiplication, 
but now the scalars are allowed to come from a ring 
rather than a held. For an example of a module over 
a ring that is not a held, take any Abelian group G. 
This can be turned into a module over Z, with addi- 
tion given by the group operation and scalar multiplica- 
tion dehned in the obvious way: for instance, 3 g means 
g + g + g, and -2 g means the inverse of g + g. 

The simplicity of this definition masks the faet that 
the structure of modules is in general far more subtle 
than that of vector spaces. For example, we can define 
a basis of a module to be a linearly independent set of 
elements that spåns the module. However, many use- 
ful facts about bases in vector spaces do not hold for 
modules. For instance, in Z, which we may consider as 
a module over itself, the set {2,3} spåns the module 
but does not contain a basis, and similarly the set {2} is 
linearly independent but cannot be extended to a basis. 
In faet, modules may be very far from having a basis: 
for example, if we consider the integers modulo n as a 
module over z, then even a single element x fails to be 
linearly independent, since nx = 0. 

The following example of a module is an important 
one. Let V be a complex vector space and let a be a lin- 
ear map from V to V. This can be made into a module 
over the ring C[X \ : if v e V and P is a complex polyno- 
mial, then Pv is defined to be P(a)v. (For instance, if P 
is the polynomial x 2 + 1, then Pv = a 2 v + v.) Applying 
general structural results about modules to this exam- 
ple, one obtains a proof of the jordan normal form 
THEOREM [III.45]. 
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III.84 Schemes 

Jordan S. Ellenberg 


One frequently finds in the history of mathematics 
that a definition thought to be completely general 
was in faet too restrictive to treat certain problems 
of interest. The notion of “number,” for instance, has 
been expanded again and again — most notably to incor- 
porate irrationalities and complex numbers, the for- 
mer arising from problems in geometry and the lat- 
ter needed in order to describe solutions to arbitrary 
algebraic equations. In a similar way, algebraic geom- 
etry, which was once understood as the study of alge- 
braic varieties, or solution sets of algebraic equations 
in some finite-dimensional space, has grown to encom- 
pass more general objects known as “schemes.” As a 
very meager example one might consider the two equa- 
tions x + y = 0 and (x + y) 2 = 0. The two equations 
have the same set of solutions in the plane, so they 
describe the same variety; but the schemes attached to 
the two objects are completely different. The reformu- 
lation of algebraic geometry in the language of schemes 
was a tremendous project spearheaded by Alexander 
Grothendieck in the 1960s. As the above example sug- 
gests, the scheme-theoretic viewpoint tends to empha- 
size the algebraic aspects of the subject (equations) 
rather than the traditionally geometric ones (solution 
sets of equations). This viewpoint has made a reality 
of the long-hoped-for unification of algebraic num- 
ber theory [IV.3] and algebraic geometry, and, indeed, 
mueh recent progress in number theory would have 
been impossible without the geometric insight supplied 
by the theory of schemes. 

Even schemes are not enough to handle all the 
problems of current interest, and still more general 
notions (stacks, “noncommutative varieties,” derived 
categories of sheaves, etc.) are brought to bear when 
necessary. These can appear exotic, but to our suc- 
cessors they will no doubt be second nature, just as 
schemes are to us. For more on algebraic geometry in 
general, see algebraic geometry [IV.7]. Schemes are 
discussed at greater length in arithmetic geometry 
[TV-6]. 


III.85 The Schrodinger Equation 

Terence Tao 


In mathematical physics, the Schrodinger equation 
(and the closely related Heisenberg equation) are the 


most fundamental equations in nonrelativistic quan- 
tum mechanics, playing the same role as Hamilton’s 
laws of motion (and the closely related Poisson equa- 
tion) in nonrelativistic classical mechanics. (In relativis- 
tic quantum mechanics, the equations of quantum held 
theory take over the role of Heisenberg’ s equation, 
while Schrodinger’s equation does not have a natural 
direct analogue.) In pure mathematics, the Schrodinger 
equation, together with its variants, is one of the basic 
equations studied in the held of partial differen- 
tial equations [IV. 16], and has applications to geom- 
etry, to spectral and scattering theory, and to integrable 
systems. 

The Schrodinger equation can be used to describe the 
quantum dynamics of many-particle systems under the 
influence of a variety of forces, but for simplicity let us 
consider just a single particle, of mass m > 0, moving 
about in n-dimensional space R™ subject to the influ- 
ence of a potential, which we shall take to be a funetion 
V : R n — R. To avoid technicalities we shall assume that 
all the funetions we discuss are smooth. 

In classical mechanics, this particle would have a 
specific position qit) e R n and a specific momentum 
pit) gr™ for each time t. (Eventually we shall observe 
the familiar law pit) = mvit), where vit) = q'it) is 
the velocity of the particle.) Thus the State of this sys- 
tem at any given time t is described by the element 
(qit), pit)) of the space R” x R™, which is known as 
phase space. The energy of this State is described by 
the HAMILTONIAN FUNCTION [III.35] H : R" X R™ -> R 
on phase space, defined in this case by 
Inl 2 

H{ q,p)= f^+ v (q). 

(Physically, the quantity p| 2 /2m = | m | v| 2 represents 
kinetic energy, while V ( q ) represents potential energy.) 
The system then evolves according to Hamilton’s equa- 
tions of motion : 


q'it) 


dl I 
3p’ 


p'(t) 


dl I 
dq ’ 


(1) 


where we keep in mind that p and q are vectors, so that 
these derivatives are gradients [1.3 §5.3]. Hamilton’s 
equations of motion are valid for any classical system, 
but in our specific case of a particle in a “potential well,” 
they become 

q'it) =— p(t), p'(t) = -VV(q). (2) 


The first equation is asserting that p = niv, while the 
second equation is basically Newton’s second law of 
motion. 
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From (1) we can easily derive Poisson’s equation of 
motion 

^ A(q(t),p(t )) = {H,A}(q(t),p(t)) (3) 

for any classical observable A: l"xE n ^l, where 
3HdA_ dAdH_ 

’ dp dq dp dq 

is the Poisson bracket of H and A. Setting A = H, we 
have in particular the conservation-of-energy law\ 

H(q(t),p(t))=E (4) 

for all t g R and some quantity E independent of t. 

Now we analyze the quantum mechanical analogue of 
the above classical system. We need a small 1 parameter 
h > 0, known as Planck’s constant. The State of the par- 
ticle at a time t is no longer described by a single point 
(q(t),p(t)) inphase space, but isinstead described by 
a wave function, which is a complex-valued function 
of position that evolves over time: that is, for each t 
we have a function <p(t) from R n to C. It is required 
to obey the normalization condition (ip(t), ip(t)} = 1, 
where ( ■ , ■ > denotes the inner product 

<<£,<p> = | 

Unlike a classical particle, a wave function i p(t) does 
not necessarily have a speciflc position q(t). However, 
it does have an average position ( q(t )>, defined as 

<<j(t)> = <Q(p(t),ip(t)> = q\ip(t,q)\ 2 dq. 

Here, we have written i p(t,q) for the value of i p(t) at 
the point q, and Q is the position operator, defined 
by (Q<P)(t,q) = qip(t,q): that is, Q is the operator 
that multiplies pointwise by q. Similarly, while ip does 
not have a speciflc momentum p(t), it does have an 
average momentum ( p(t )), defined as 

<p(t)> = {Pip(t),ip(t)) = J \ mn (V q V(t,qfj‘flt\t,q)dq, 

where the momentum operatorP is defined by Planck’s 
law 

Pip(t,q) = jS7 a ip(t,q). 

Note that the vector ( p ( t ) ) is real-valued because all the 
components of P are self-adjoint [III. 5 2 §3.2]. More 
generally, given any quantum observable, by which 
we mean a self-adjoint operator [III.52] A acting on 
the space L 2 (R n ) of complex-valued square integrable 


1. In many applicatlons it is convenient to normalize h (and m ) to 

equal 1. 


functions, we can define the average value ( A(t )) of A 
at time t by the formula 


(A(t)) = (A<p(t),«p(t)>. 


The analogue of Hamilton’s equations of motion (1) is 
now the time-dependent Schrodinger equation: 

ih d * = Hip, (5) 

where H is now a quantum observable rather than a 
classical one. More precisely, 


In other words, 


H =Tr 


V (Q). 


i h^(t,q) = Hip{t,q) 

= A q ip(t,q) + V(q)i p(t,q). 




is the Laplacian of t p. The analogue of Poisson’s equa- 
tion of motion (3) is the Heisenberg equation 

^{A(t)) = ^[H(t),A(t)]j (6) 


for any observable A, where LA, BJ = AB - BA is the 
commutator or Lie bracket of A and B. (The quantity 
(i/?i)[A,B] is occasionally referred to as the quantum 
Poisson bracket of A and B.) 

If the quantum State ip oscillates in time accord- 
ing to the formula ip(t,q) = e IEI ' hlt T(0, q) for some 
real number E (known as the energy level or eigen- 
value), then one has the time-independent Schrodinger 
equation: 


Hip(t) = Eip(t) for all times t (7) 


(compare this with (4)). More generally, the impor- 
tant subject of spectral theory provides many links 
between the time-dependent equation (5) and the time- 
independent equation (7). 

There are several strong analogies between the equa- 
tions of classical mechanics and those of quantum 
mechanics. For instance, from (6) one has the equations 

^(qUl) = ^<p(t)>, = -<V„Vtø)(t)>, 

which should be compared with (2). Also, given any 
classical solution t >- (q(t), p(t)) to Hamilton’s equa- 
tion of motion, one can construct a corresponding 
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family of approximate solutions < p(t) to Schrodinger’s 
equation, for instance by the formula 2 

ip(t, q) = gd/hiUt) e (i/Wp(t)- iq-q(t)) (p(q _ q(t)). 


where 

is the classical action and op is any slowly vary- 
ing function that is normalized in the sense that 
J K n \qp(q)\ 2 dq = 1. One can verify that ip solves (5) 
except for some errors that are small when h is small. 
In physics, this faet is an example of the correspon- 
dence principle, which asserts that classical mechanics 
can be used to approximate quantum mechanics accu- 
rately if Planck’s constant is small and one is working 
at macroscopic scales (which is what allows us to use 
slowly varying funetions ep ). In mathematics (and more 
precisely in the helds of microlocal analysis and semi- 
classical analysis), there are a number of formalizations 
of this principle that allow us to use knowledge about 
the behavior of Hamilton’s equations of motion in order 
to analyze the Schrodinger equation. For example, if the 
classical equations of motion have periodic solutions, 
then the Schrodinger equation often has nearly peri- 
odic solutions, whereas if the classical equations have 
very chaotic solutions, then the Schrodinger equation 
typically does as well (this phenomenon is known as 
quantum chaos or quantum ergodicity). 

There are many aspects of the Schrodinger equa- 
tion that are of interest. We mention just one of them 
here for illustration, namely that of scattering theory. 
If the potential function V decays sufheiently quickly 
at inhnity, and k e K n is a nonzero frequency vec- 
tor, then, setting the energy level as E = h 2 \k\ 2 /2m, 
the time-dependent Schrodinger equation Hip = Eip 
admits solutions ip(q) that behave asymptotically (as 
\q\ - 00 ) as 


ip(q) * e lk '« 


/( 


q_ \ e 1|k|l<il 
\q\ 


for some canonical function / : S n_1 xR n ^C, which is 
known as the scattering amplitude function. This scat- 
tering amplitude depends (in a nonlinear fashion) on 
the potential V, and the map from V to / is known as 


2. Intuitively, this function t//(t, q) is localized in position near q(t) 
and localized in momentum near p(t), and is thus localized near 
(q (t),p(t)) inphase space. Such a localized function, exhibiting such 
“particle-like” behavior as having a reasonably well-deftned position 
and velocity, is sometimes known as a “wave packet.” A typical solu- 
tion of the Schrodinger equation does not behave like a wave packet, 
but can be decomposed as a superposition or linear combination of 
wave packets; such decompositions are a useful tool in analyzing 
general solutions of such equations. 


the scattering transform. The scattering transform can 
be viewed as a nonlinear variant of the fourier trans- 
form [in.27]; it is connected to many areas of partial 
differential equations, such as the theory of integrable 
systems. 

There are many generalizations and variants of the 
Schrodinger equation; one can generalize to many- 
particle systems, or add other forces such as mag- 
netic helds or even nonlinear terms. One can also 
couple this equation to other physical equations such 
as maxwell’s equations [IV.l 7 §1.1] of electromag- 
netism, or replace the domam B” by another space 
such as a torus, a discrete lattice, or a manifold. Alter- 
natively, one could place some impenetrable obstacles 
in the domain (thus effeetively removing those regions 
of space from the domain). The study of all of these 
variants leads to a vast and diverse held in both pure 
mathematics and in mathematical physics. 


III.86 The Simplex Algorithm 

Richard Weber 


1 Linear Programming 

The simplex algorithm is the preeminent tool for solv- 
ing some of the most important mathematical prob- 
lems arising in business, science, and technology. In 
these problems, which are called linear programs, we 
are to maximize (or minimize) a linear function sub- 
ject to linear constraints. An example is the diet prob- 
lem posed by the U.S. Air Force in 1947: find quantities 
of seventy-seven differently priced foodstuffs (cheese, 
spinach, etc.) to satisfy a man’s minimum daily require- 
ments for nine nutrients (protein, iron, etc.) at least 
cost. Further applications occur in choosing the ele- 
ments of an investment portfolio, rostering an airline’s 
crew, and hnding optimal strategies in two-person 
games. The study of linear programming has inspired 
many of the central ideas of optimization theory, such 
as duality [III. 19], the importance of convexity, and 
COMPUTATIONAL COMPLEXITY [IV.21]. 

The input data of a linear program (LP) consists of 
two vectors b ER" and cel", and anmxn matrix 
A = ( aij ). The problem is to hnd values for n non- 
negative decision variables, x\ , . . . , x n , to maximize the 
objective function c\x \ + ■ ■ ■ + c n x n , subject to m con- 
straints, anXi + ■ ■ ■ + ai n x n ^ bi, i = 1, . . . , m. In the 
diet problem, n = 7 7 and m = 9. In the following sim- 
ple example (not a diet problem), n = 2 and m = 3. In 
serious real-life problems, n and m can be greater than 
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100 000. 

Maxiinize Xi + 2x2 
subject to - x\ + 2x2 =$ 2, 

2xi - X2 <5, 

X\,X2 ^ 0. 

The constraints defme a feasible region for (xi , x-i), 
a convex polygon that is depicted by the shaded region 
“P” in figure 1. The two dotted lines mark those x where 
the value of the objective function value is 4 and where 
it is 6. Clearly, it is maximized at point C. 

The general story is similar to that of the example. 
If the feasible region P = {x : Ax < b, x ^ 0} is 
nonempty, then it is a convex polytope in IR", and an 
optimal solution can be found at one of its vertices. 
It is helpful to introduce “slack variables” X3, X4, X5 > 
0 to take up the slack on the left of the inequality 
constraints. We can write 

-Xi + 2X2 + X3 =2, 

XI + X2 + X4 = 4, 

2xi - X2 + X5 = 5. 

We now have three equations in five variables, so we 
can set any two of the variables xi, .. . ,xs equal to 0, 
and solve the equations for the other three variables 
(or solve a perturbation of them if they happen not 
to be independent). There are ten ways to choose two 
variables from five. Not all of the ten corresponding 
solutions satisfy Xi , X2 , X3 , X4 , X5 ^ 0, but five of them 
do. These are called basic feasible solutions (BFSs), and 
correspond to the vertices of P marked O, B, C, D, E. 


2 How the Algorithm Works 

George Dantzig invented the simplex algorithm in 1947 
as a means of solving the Air Force’s diet problem men- 
tioned at the start. The word “program” was not yet 
used to mean computer code, but was a military term 
for a logistic plan or Schedule. The fundamental faet on 
which the algorithm relies is that if an LP has a bounded 
optimal solution, then the optimum value is attained at 
a BFS, i.e., at a vertex (or so-called “extreme point”) of 
the polytope of feasible points, P. Another name for 
the feasible polytope is “simplex,” which is where the 
algorithm gets its name. It works as follows. 

Step 0. Pick a BFS. 

Step 1. Test whether this BFS is optimal. 

If so, stop. If not, go to step 2. 

Step 2. Find a better BFS. 

Repeat from step 1. 

Since there are only fimtely many BFSs (i.e., vertices 
of P), the algorithm must stop. 

Now that we have an overview, let us look at the 
details. Suppose that at step 0 we pick the BFS of 
x = (xi, X2, X3, X4, X5) = (0,0, 2,4,5), corresponding 
to vertex O. At step 1 we wish to know if the objec- 
tive function can be inereased if xi or X2 is inereased 
from 0. So we write X3, X4, X5, and the objective func- 
tion c T x in terms of Xi and X2, and display this as 
dictionary 1. 



Dictionary 1 

X3 

= 2 + Xi - 2X2, 

x 4 

= 4-X1 -X2, 

*5 

= 5 - 2xi + X2, 

C T X 

= XI + 2X2- 


The last equation in the dictionary shows that we can 
inerease the value of c T x by inereasing either xi or X2 
from 0. Suppose that we inerease X2. The first and sec- 
ond equations show that X3 and X4 must decrease, and 
we cannot inerease X2 beyond 1, at which point X3 = 0 
and X4 = 3, X5 = 6. Inereasing X2 as mueh as pos- 
sible, we complete step 2 and arrive at a new BFS of 
x = (0, 1, 0, 3, 6), which is vertex B. Now we are ready 
for step 1 again, and so we write X2, X4, X5, and c T x in 
terms of the variables that are now zero, namely xi, X3, 
to give dictionary 2. 
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Dictionary 2 


Dictionary 3 

X 2 = 1 + |xi - 3X3, 


Xi = 2 + 5X3 - 3X4, 

X 4 = 3 — §Xi + |x 3 , 


X 2 = 2 — 5X3 — 5X4, 

x 5 = 6— §X1 - |x 3 , 


X5 = 3 - X3 +X4, 

C T X = 2 + 2X1 - X3. 


c T x = 6 - 3X3 - 3X4. 


This shows that c T x can be increased by increasing 
Xi from 0, but that Xi can increase no further than 2 
because at that point X 4 = 0. This brings us to a new 
solution (2, 2, 0, 0, 3), which is vertex C. Once more, we 
are ready for step 1, and so compute dictionary 3, now 
writing things in terms of X3 and X4, which are 0. The 
algorithm now stops because, as we require X3 , X4 > 0, 
the bottom Une of dictionary 3 proves that c T x ^ 6 for 
all feasible x. 

There is other important information in the final 
dictionary. If b is changed to b + e, for small e T = 
(d, £2, £3), then the maximum value of c T x will change 
to 6 + |ei + | £2- The coefficient 3 is called a “shadow 
price,” because it is what we should be willing to pay 
per unit increase in bi. 

3 How the Algorithm Performs 

In running the simplex algorithm the serious work 
comes in computing the dictionaries. To find dictio- 
nary 2, we could use the first equation of dictionary 1 
to rewrite X 2 in terms of Xi and X3 , and then substitute 
for X2 in the other equations. Versions of the simplex 
algorithm have been invented that reduce the comput- 
ing effort by taking advantage of special structure in 
the matrix A, such as the faet that most of its entries 
are zero. The dictionary data is often held in a so-calied 
tableau of coefficients. 

There are many other practical and theoretical issues. 
One concerns the selection of the pivot, that is, the vari- 
able that is to be increased from 0. Starting at O, and 
depending on which of Xi and X2 we choose as the first 
variable to increase from zero, the path to C can be O, 
E, D, C or O, B, C. There is no known way to guarantee 
that the algorithm takes the shortest path. 

The question of how many steps the simplex algo- 
rithm really needs is related to the famous Hirsch con- 
jecture: that for any bounded n-dimensional polytope 
with m faces, the diameter (defined as the maximum 
number of edges on the shortest edge-traversing path 
between any two vertices) is at most m-n. If this were 
true, it would suggest that some version of the sim- 
plex algorithm might run in a number of steps that 


grows only linearly in the numbers of variables and 
constraints. However, Klee and Minty (1972) have given 
an example based on a perturbed n-dimensional cube 
(m = 2n faces and diameter n), in which if the algo- 
rithm selects among possible pivots by choosing the 
one for which the objective funetion inereases at the 
greatest rate per unit increase in that variable, then 
it visits all 2 n vertices before reaching the optimum. 
Indeed, for most deterministic pivot selection rules, 
examples are known in which the number of steps 
grows exponentially in n. 

Fortunately, things are usually mueh better in prac- 
tical problems than in worst-case examples. Typically, 
only 0 (m) steps are needed to solve a problem with 
m constraints. Moreover, Khachian (1979) proved (by 
analysis of the so-called ellipsoid algorithm) that linear 
programs can in principle be solved by an algorithm 
whose running time grows only polynomially in n. Thus 
linear programming is mueh easier than “integer linear 
programming,” in which x\,...,x n are required to be 
integers and for which no algorithm with polynomial 
running time is known. 

Karmarkar (1984) pioneered development of “inte- 
riør” methods for linear programming problems. These 
move through the interior of the polytope P, rather than 
among its vertices, and can sometimes solve large LPs 
more quickly than the simplex algorithm. Modern com- 
puter software uses both methods and can easily solve 
LPs with millions of variables and constraints. 

Further Reading 
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Solitons 

See LINEAR AND NONLINEAR WAVES AND 
SOLITONS [III. 5 1 ] 
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Suppose that the only functions we have come across 
are quotients of polynomials and that we are asked to 
solve the differential equation 

f'(x) = l/x (1) 

for all x > 0, subject to the condition /( 1) = 0. 

If we try f(x) = P(x)/Q(x), where P and Q are 
polynomials with no common factors, then we find that 
x(Q(x)P'(x) -P(x)Q'(x)) = Q(x) 2 . 

By comparing coefficients we can show that Q(0) = 
P( 0) = 0, which shows that, contrary to our assump- 
tions, both P(x) and O(x) are divisible by x. Thus, we 
cannot solve equation (1) in terms of known functions. 
However, the fundamental theorem of calculus 
[1.3 §5.5] tells us that equation (1) does indeed have a 
solution, namely 

FM = J* * d t. 

Further study shows that the function F has many 
useful properties. For example, using the substitution 
u = t/a, we find that 



= F(a)+F(b), 


and, using the formula for differentiating an inverse 
function, we find that F 1-1 is the solution of the differ- 
ential equation 

g'(x) = g(x). 

We therefore give the function a name (the logarithm) 
and add it to our list of standard functions. 

At a more advanced level, integration by parts shows 
that the gamma function [III.31] (introduced by euler 
[VI.19]) 

T(x) = J q t* 'e~'d(, 

defined for all x > 0, has the property that 

T(x) = (x- l)T(x- 1) 

for all x > 1, and therefore T(n) = (n - 1)! for all 
integers n ^ 1 (since 7T1 ) = 1). As one might expect 
from its association with factorials, the gamma func- 
tion turns out to be very useful in number theory and 
statistics. 

In practice, a “special function” is any function that, 
like the logarithm and the gamma function, has been 
extensively studied and has turned out to be useful. 
Some authors use the phrase “special functions” in a 


more restricted sense, meaning something like “func- 
tions that turn up in the solution of physical problems” 
or “functions other than those generally provided by a 
pocket calculator,” but these restrictions do not seem 
to be very useful. 

In spite of this apparent generality, the theory of spe- 
cial functions is linked in the minds of many mathe- 
maticians to a collection of particular ideas and meth- 
ods. Indeed, it is often linked to particular hooks like 
Whittaker and Watson’s A Course of Modern Analysis 
(which was first published in 1902 and is still sell- 
ing well) and Abramowitz and Stegun’s Handbook of 
Mathematical Functions. These connections may sim- 
ply be accidents of history, but the phrase “special 
functions” is often associated with other phrases like 
“equations of mathematical physics,” “beautiful formu- 
las,” and “sheer ingenuity.” We illustrate this and other 
themes in the particular case of Legendre polynomials. 
(The next paragraph involves more advanced mathe- 
matics and glosses over several long calculations, but 
the reader may simply glance over its contents and 
resume careful reading thereafter.) 

Suppose that we wish to examine the gravita- 
tional potential i// of Earth by looking at solutions of 
laplace’s equation [1.3 §5.4] Aip = 0. Since Earth is 
more or less spherical, we use spherical polar coordin- 
ates (r, 0, </>) and, noting that Earth is symmetric about 
its axis of rotation, we may suppose that <£ depends 
only on r and 0. Under these assumptions, Laplace’s 
equation takes the form 


dr) 

Following the standard technique of separation of vari- 
ables, we look for solutions of the form i p(r,0) = 
R(r)0(0). After a little calculation, equation (2) yields 


Since one side of equation (3) depends on r alone 
and the other on 0 alone, both sides must equal some 
constant k. The equation 


has the solution R (r ) = r whenever 1(1 + 1) = k. The 
corresponding equation for 0 is then 


1 


,, (si 


)'( 0 )) = - 1 ( 1 + 1 ). 


(4) 


sin 00(0) dd 

We nowmake the substitution x = cosØ, y(x) = 0(0) 
to convert (4) to Legendre’ s equation 

(1 - x 2 )y"(x) - 2 xy’(x) + 1(1 + 1 )y(x) = 0. (5) 
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Routine equating of coefficients reveals that, if we seek 
nontrivial solutions of the form f(x) = Zj=o n/x- 7 , 
then, unless i is an integer, f(x) is unbounded as x 
approaches 1 (that is, as 9 approaches 0), so these solu- 
tions are not useful physically. However, if l is a positive 
integer, then there is a polynomial solution of degree 
i. (If l is a negative integer, the same polynomials reap- 
pear.) In faet, we have the following stronger statement: 
if l is a positive integer, then there exists a unique poly- 
nomial Pi of degree l satisfying Legendre’s equation (5) 
such that Pi (1) = 1. We call P; the Ith Legendre polyno- 
mial. Returning to our original problem, we see that it 
has solutions of the form 

It is obvious to the physicist, and can be proved by the 
mathematician, that this is the most general solution if 
we also demand that <p(r, 9) — 0 as r — oo. Notice that 
if r is large, then only the first few terms will contribute 
mueh to the final answer. 

There are many different ways of obtaining the 
Legendre polynomials. The reader is invited to verify 
that, if we define Q n inductively by setting Qo (x) = 1 
and Qi(x) = x, and using the “three-term recurrence 
relation” 


(n + l)Q n +i(x) - (2n+ l)xQ n (x) + nQ n -i(x) = 0, 


then Q n ( 1) = 1 and Q n is a polynomial that satis- 
fies Legendre’s equation (5) (with l = n), from which it 
follows that Q n is the Legendre polynomial of degree n. 
If we set v n (x) = (x 2 - l) n , then 

(x 2 - l)Vn(x) = 2 nxv(x). 


Differentiating both sides of this equation n + 1 times 
using Leibniz’s rule, we see that vh nl satisfies Legen- 
dre's equation (5) with l = n. Differentiating v n (x) = 
(x - l)"(x + 1 ) n n times using Leibniz’s rule and 
noting that all but one of the resulting terms vanish 
when x = 1, we see that v™ is a polynomial with 
Vn( 1) = 2 n n\. Putting all this information together, 
we obtain Rodriguez’s formula 


P n M = 


1 

2 n n\ V 


V M = 


1 

2”n! dx 


(x 2 - 1) M . 


Equation (5) is an example of a Sturm-Liouville equa- 
tion. Setting l = n and y = P n and rewriting slightly, 
we obtain the equation 

-p((l - x 2 )P;(x)) + «(n + l)Pn(x) = 0. (6) 


If m and n are positive integers, then, using (6) and 
integrating by parts, we obtain 

-n(n + 1) J ^ P n (x)P m (x) dx 

= (^((l-x 2 )P;(x)))p m (x)dx 

= [d-x 2 )p;(x)p m (x)]i 1 

■+ 1 ^1 -x 2 )p;(x)p„(x)dx 

= | ^(1 -x 2 )P;(x)P^(x)dx. 

Thus, by symmetry, 


n(n i T) P„(x)P TO (x)dx 

= m(m+ 1) J ^P n (x)P m (x) dx, 

J 1 ^P„(x)P m (x)dx = 0. (7) 

The “orthogonality relation” given by (7) has impor- 
tant consequences. Since P r is a polynomial of degree 
exaetly r, we know that any polynomial Q of degree 
n - 1 or less can be written 

Q(x) - X O-rPrM 

f 1 P„(x)Q(x) dx = X a r f 1 PnMPr(x) dx = 0. 

’ r = 0 -^ _1 

(8) 

Thus, P n is orthogonal to all polynomials of lower 
degree. 

Suppose that P n (x) changes sign at the points oq , 012 , 
. . . , <x m on the interval [-1,1]. Then, if we write 


Q(x) = (x - ai)(x - «2) ■ ■ ■ (x - a m ), 


we know that P(x)Q(x) does not change sign on 
[-L, 1] and so 


0. 


By equation (8) this means that the degree m of O is at 
least n and so (since a polynomial of degree n can have 
at most n zeros) P n must have exaetly n distinet zeros 
on [-1,1]. 

gauss [VI.26] made use of these facts to obtain a pow- 
erful method of numerical integration. Suppose that 
xi,X2,. ■ . ,x n +i are distinet points on [-1, 1]. If we set 


ej(x) 
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then e,(x) is a polynomial of degree n that takes the 
value 1 when x = x/ and 0 when x = Xk with k * j. 
Thus, if R is any polynomial of degree at most n, the 
polynomial Q given by 

Q(x) = R(x 1 )e 1 (x)+R(x 2 )e 2 (x) + ■■■ 

+ R(x n+ i)e n+ i(x) - R(x) 

has degree at most n, and R - Q vanishes at the n + 1 
points Xj. It follows that R = Q, so 

R(x) = R(xi)ei(x) + R(x 2 )e 2 ix) + ■■■ 

+ R(x n+ i)e n +i(x). 

If we write aj = jij e/(x) dx, then 

| R(x) dx = aiR(xi) + a 2 R(x 2 ) + ■ ■ ■ + a n R(x n + i). 
It is natural to hope that the approximation 

| ^/(x)dx « aif(xi) + a 2 f(x 2 ) + ■ ■ ■ + a n f(x n + 1 ), 
(9) 

which is an exact equality when / is a polynomial of 
degree n or less, will work well for other well-behaved 
functions. 

Gauss observed that we can make a major improve- 
ment by taking the xj to be the n + 1 roots of the 
(n+l)st Legendre polynomial. Suppose that P is a 
polynomial of degree at most 2n + 1. Thenwe can write 
P(x) = Qix)P n +iix) + R(x), 
where Q and R are polynomials of degree at most n and 
P n + 1 is the (n+ l)st Legendre polynomial. Now P n+ i is 
orthogonal to polynomials over lower degree (and, in 
particular, to Q), P n -i (Xj) = 0 by the definition of Xj, 
and the approximation (9) is an equahty for R. Thus, 

J^P(x) dx = J 1 ^ P n+ i(x)Q(x) dx + J 1 ^ R{x) dx 
= 0 + X ajR(Xj) 

= X a J (P n+i {x j )Q(Xj)-¥R(x j )) 

= X ajP(Xj). 

We have shown that the “quadrature formula” (9) is 
actually exact for all polynomials of degree at most 
2 n + 1, provided we choose the Xj to be the num- 
bers suggested by Gauss. Unsurprisingly, this choice 
gives an extremely good way of estimating integrals 
numerically. “Gaussian quadrature” is one of the two 


main methods used to evaluate integrals on computers 
today. 

We conclude with a brief look at a few other special 
functions. 

Consider de Moivre’s formula 
cosnØ + isinnd = (cosd + isinØ)". 


Using the binomial expansion, we see that 

cosnd + isinnd = X (i) r cos ra_r Øsin r 0, 
and, taking real parts, 

cos nØ = X (^r) ( _ l) r cos n_2r Øsin 2r 0. 


Since sin 2 0 = 1- cos 2 0, we have 

cos »e - ( 2 " ) <- u' cos-»ci - c°s 2 ey 

= T n (C0SØ), 


where T n is a polynomial of degree n called the nth 
Chebyshev polynomial. The Chebyshev polynomials 
play an important role in numerical analysis. 

The next collection of functions requires us to calcu- 
late with infmite sums. Readers may treat our calcula- 
tions as plausible or justify them rigorously according 
to taste. Observe first that 

is well-defined for all real noninteger x. Note also that 
h(x + tt) = h(x) and h(| tt - x) = H(^tt + x). Set 
f(x ) = h (x) - cosec 2 (ttx). By showing that there are 
constants K\ and K 2 such that 


0< I 


(X-MTT) 2 


<Ki 


and 

o 1 

0 < cosec^ x ~ < K 2 

x 2 

for all 0 < x ^ ^tt, we deduce that there is a con- 
stant K such that |/(x) | < K for all 0 < x < tt. Simple 
calculations show that 

f(x) = \lf(\x) + f(\(x + TT))). (10) 

A single application of (10) shows that |/(x)| < \k 
for all 0 < x < tt, and repeated applications show that 
fix) = 0. Thus 

cosec 2 x X^ (x . 
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for all real noninteger x. 

If we seek analogues in the complex plane, we are led III.88 The Spectrum 
to functions of the type q p Allan 


Fte) ~ J.J. te- n- «.()■ • 

Observe that, while the real function cosec 2 x satisfies 
cosec 2 (x + tt) = cosec 2 (x) and is periodic with period 
tt, the complex function F just defined satisfies 
F(z+l)=F(z), F(z + i) = F(z) 


and is doubly periodic with periods 1 and i. Functions 
like F are called elliptic functions and have a theory 
that parallels that of the trigonometric functions 
[III.94], 

The function E(x) = (2n)~ 1/2 e~ x2/2 is called the 
Gaussian (see [III.73 §5]) or normal function and 
appears in probability and the study of diffusion pro- 
cesses. The partial differential equation 


d 2 4> 

dx 2 


dcb 

(x, t) = K-^(x,t) 


with x corresponding to distance and t to time provides 
a reasonable model for diffusion. It is easy to check that 
</>(x, t) = <p(x, t) = (Kt)~ 1/2 E(x(Kt)~ 1/2 ) is a solu- 
tion. By sketching a graph of ip(x, t) as a function of x 
for various values of t, readers will see that ip can be 
considered as the response to a disturbance at x = 0 
when t = 0. By considering the behavior of <//(x, t) as 
a function of t for a given value of x, they will see that 
“the effect at x of a disturbance at the origin becomes 
noticeable only after a time of the order x 1/2 .” Living 
cells depend on diffusion processes and the result just 
given suggests (correctly) that such processes are very 
slow over long distances. It is plausible that this sets a 
limit on the size of a single cell: a large organism must 
be multi-celled. 

Statisticians constantly use the related error function 


erf(x) = - 


[T/2 


exp(-t 2 ) dt. 


There is a famous theorem of liouville [VI.39] that 
shows that erf(x) cannot be expressed as a composi- 
tion of elementary functions (such as quotients of poly- 
nomials, trigonometric functions, and exponential 
functions [III. 2 5]). 

We have been able to look at only a few properties 
of a few special functions in this article, but even this 
small sample shows how much interesting mathemat- 
ics arises when we study one function or a class of 
particular functions rather than functions in general. 


In the theory of linear maps [1.3 §4.2], or operators, 
on a vector space [1.3 §2.3], the notions of eigen- 
value and eigenvector [1.3 §4.3] play an important 
role. Recall that if V is a vector space (over R or C) and 
if T : V — V is a linear mapping, then an eigenvector 
of T is a nonzero vector e in V such that T(e) = Ae for 
some scalar A; then A is the eigenvalue corresponding 
to the eigenvector e. If V is finite dimensional, then the 
eigenvalues are also the roots of the characteristic poly- 
nomialx(t) = det (ti - T) of T. Because every noncon- 
stant complex polynomial has a root (the so-called fun- 
damental THEOREM OF ALGEBRA [V.l 5]), it follOWS that 
every linear operator on a ftnite-dimensional, complex 
vector space has at least one eigenvalue. If the scalar 
field is R, then not all operators have eigenvectors (e.g., 
consider a rotation about the origin in R 2 ). 

The linear operators that arise in analysis usually 
act on infinite-dimensional spaces (see [III.52]). We con- 
sider continuous linear operators acting on a complex 
banach space [III.64]; these will be referred to simply 
as operators (even though not all linear operators on 
an inftnite-dimensional Banach space are continuous). 
We shall now see that, for X infinite dimensional, not 
every such operator has an eigenvalue. 

Example 1. Let X be the Banach space C[0, 1], consist- 
ing of all continuous, complex-valued functions on the 
closed interval [0, 1] of the real line. The vector-space 
structure is the “natural” one (e.g., for f,geX the sum 
f + g is defined by setting ( f + g)(t ) = f(t)+g(t) for 
each t and the norm is the supremum norm, that is, the 
largest value of any |/(t)|). 

Now let u be a continuous complex-valued function 
on [0, 1]. We can associate with it a multiplication oper- 
ator M u on C[0, 1] as follows. Given a function /, let 
M u (/) be the function that takes t to u(t)f(t). It is 
clear that M u is linear and continuous. We shall see that 
whether M u has an eigenvalue depends on the choice 
of u. We consider two simple cases. 

(i) Let u be the constant function u(t I’-# fc. Then 
evidently M u has the single eigenvalue k and every 
(nonzero) function / in X is an eigenvector. 

(ii) Let u(t) = t for all t. Suppose that the complex 
number A is an eigenvalue of M„. Then there is 
some / e C[0, 1], not identically zero, such that 
u(t)f(t) = A f(t) and so ( t - A)/(t) = 0 for all 
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t. But then f(t) = O for all t £ A, so that, since 
/ is continuous, f(t) s 0, contrary to hypothesis. 
So, for this choice of u, the operator M u has no 
eigenvalue. 

Let X be a complex Banach space and let T be an 
operator on X. Then T is said to be invertible if and 
only if there is some operator S on X for which ST = 
TS = I (here, ST is the composition of S and T, and I 
is the identity operator on X). It can be shown that T is 
invertible if and only if T is both injective (i.e., T(x) =0 
only for x = 0) and surjective (i.e., T (X) = X). The part 
here that is not just simple algebra is to show that if T 
is both injective and surjective, then the linear inverse 
T _1 is a continuous operator. A complex number A is 
an eigenvalue of T precisely if T - AI is not injective. 

If V is a finite-dimensional space, then an injective 
operator T : V — V is necessarily also surjective, 
and hence invertible. For X infmite dimensional this 
implication is no longer valid. 

Example 2. Let H be the hilbert space [III.37] S 2 that 
consists of all sequences (5 ra )n>i of complex numbers 
such that Xn>i I ?n 1 2 < oo- Let S be the “right-shift” 
operator defined by S(5 i,§ 2,§3,-.) = (0, §i , § 2 , - - - )- 
Then S is injective but not surjective. The “reverse 
shift” 5*, definedby S* (§ 1 , §2, §3, ■ ■ ■ ) = 
is surjective but not injective. 

With this example in mind, we make the following 
definition. 


Example 1 continued. We have already seen that not 
all multiplication operators have eigenvalues. However, 
they do have an easily described Spectrum. Let M u be 
such an operator and let S be the set of all values u(t ) 
taken by the function u. Let p = u(to) be one of these 
values and consider the operator M u - pi. Given any 
function / in C[0, 1], the value of (M u - pl)f at to is 
n(to)/(to) - pf(to) = 0. It follows that M u - pi is not 
surjective (for instance, the range of M u - pi does not 
contain any nonzero constant function) and therefore 
p belongs to the Spectrum of M„. Thus 5 is contained 
in the Spectrum of M„; it is not hard to show that the 
two are in faet equal. 

We may easily generalize this example to show that if 
K is any nonempty compact subset of C, then there is 
a linear operator T with K as its Spectrum. Let X be the 
space of continuous complex-valued funetions defined 
on K, for each z e K, let u(z) = z, and let T be the 
multiplication operator M u , defined as it was when K 
was the set [0, 1]. 

The Spectrum is central to most aspects of operator 
theory. We shall briefly mention a result about Hilbert- 
space operators, known as the spectral theorem (there 
are a number of variations). 

Let H be a Fiilbert space with inner product (x,y). A 
continuous linear operator T on H is called Hermitian 
if (Tx,y) = (x, Ty) ( x,yeH ). 

Examples 4. 



OK as ‘Examples’. 


Definition 3. Let X be a complex Banach space and let 
T be an operator on X. The Spectrum of T, denoted by 
Sp T (or a(T)), is the set of all complex numbers A such 
that T - Ål is not invertible. 

The following remarks should be clear. 

(i) If X is finite dimensional, then Sp T is just the set 
of eigenvalues of T. 

(ii) For general X, Sp T includes the set of eigenvalues 
of T, but may be larger (e.g., in example 2, 0 is 
not an eigenvalue of S, but 0 does belong to the 
Spectrum of S). 

It is easy to show that the Spectrum is always a 
bounded and closed (i.e., compact [III.9]) subset of C. 
A rather deeper faet is that it is never empty: that is, 
there will always be some A for which T - Ål is not 
invertible. That is proved by applying liouville’s the- 
orem [1.3 §5.6] to the analytic operator-valued function 
A £*■ (A I - T) -1 , defined for A not in the Spectrum of T. 


(i) If H is finite dimensional, then a linear operator 
5 on H is Hermitian if and only if, with respect 
to some (and hence every) orthonormal basis 
[III.37], 5 is represented by a Hermitian matrix (i.e., 
a matrix A with A = A T ). 

(ii) On the Hilbert space L 2 [0, 1], let M u be the oper- 
ator of multiplication by a continuous function u 
(just as in example 1 , but now we apply M u to fune- 
tions in I 2 [0, 1] rather than just C[0, 1]). Then M u 
is Hermitian if and only if u is real-valued. 

If H is finite dimensional and T is a Hermitian oper- 
ator on H, then H has an orthonormal basis consist- 
ing of eigenvectors of T (a “diagonal basis”). Equiva- 
lently, T = ’£.j =1 \jPj, where { Ai, . . . , Afc} are the dis- 
tinet eigenvalues of T and Pj is the orthogonal projec- 
tion of H onto the eigenspace Ej = {x e H : Tx = 
A jx}. 

If H is infmite dimensional and T is a Hermitian oper- 
ator on H, then it is not generally true that H has a 
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basis of eigenvectors. But, very importantly, the rep- 
resentation T = X A/ Pj does generalize to a represen- 
tation T = J AdP, a kind of integral with respect to a 
“projection-valued measure” on the Spectrum of T. 

There is an intermediate case, for so-called compact 
Hermitian operators, “compactness” being a kind of 
strong continuity, of great importance in applications. 
The technicalities are much simpler than in the general 
case, involving an infmite sum, rather than an integral. 
A very readable introduction may be found in Young 
(1988). 

Further Reading 

Young, N. 1988. An Introduction to Hilbert Space. Cam- 
bridge: Cambridge University Press. 


III.89 Spherical Harmonics 


The starting point for fourier analysis [III.27] is the 
observation that a wide class of periodic functions 
f(9) with period 2tt can be decomposed as infinite 
linear combinations of the trigonometric functions 
[III. 94] sin nø and cos nd, or, equivalently, as sums of 
the form Xn=-°° a n e in0 . 

A useful way to think of a periodic function / defined 
on the real line is as an equivalent function F defined on 
T, the unit circle in the complex plane. A typical point 
on the circle has the form e'°, and we define F(e w ) to 
be f(9). (Note that if we add 2tt to 9 then F(e i0 ) does 
not change because e i0 = e' (0+27T) and f(0) does not 
change because / is periodic with period 2tt.) 

If f(9) = Xn=-°° a n ei ™ 0 311(1 z = e 10 > then 

F(z) = Xn=-°° a nZ n - Therefore, if we consider func- 
tions defined on T rather than periodic functions 
defined on R, then Fourier analysis decomposes our 
functions into infmite linear combinations of the func- 
tions z n , where n can be any integer. 

What is special about the functions z n ? The answer is 
that they are the characters of T, which means that they 
are the only nonzero continuous complex-valued func- 
tions defined on T that satisfy the relation cp(zw ) = 
</>(z)</>(tp) for every z and w in T. 

Now imagine that F is a function defined not on T 
but on the two-dimensional set S2, which is the unit 
sphere in R 3 (defined as the set of points (x,y,z) such 
that x 2 + y 2 + z 2 = 1). More generally, how about func- 
tions F defined on Sd-i (defined as the set of points 
(xi, . . . ,Xd) such that x 2 + ■ ■ ■ +x| = l)?Is there a nat- 
ural way of decomposing such an F, at least if it is suffi- 


ciently nice? That is, is there a good way of generalizing 
Fourier analysis to higher-dimensional spheres? 

There is an important and initially discouraging dif- 
ference between the sphere S2 and the circle Si = T. We 
defined T as a set of complex numbers rather than as a 
set of points in the plane R 2 because that way it forms a 
multiplicative group. The sphere, by contrast, does not 
have a useful group structure (for a clue about why, 
see QUATERNIONS, OCTONIONS, AND NORMED DIVISION 
algebras [III.78]), so we cannot talk about characters. 
This makes it less obvious what the “nice” functions 
should be, into which we might hope to decompose 
more general functions. 

Flowever, there is another way of explaining why the 
trigonometric functions arise naturally, one that does 
not involve complex numbers. We can write a typical 
point in Si as (x,y) with x 2 + y 2 = 1, or equiva- 
lently as (cos 6, sin 9) for some real number 9. Then our 
basic functions, if we wish to avoid complex numbers, 
are cos n.9 and sin nd, but these can also be written in 
terms of x and y. For instance, cos 6 and sind are x 
and y, respectively, cos 2 9 = cos 2 9 - sin 2 9 = x 2 - y 2 , 
and so on. (Note that x 2 - y 2 = 2x 2 -1 = 1- 2 y 2 , 
since x 2 + y 2 = 1.) In general, cos n9 and sin nø can 
always be written as polynomials in cos 9 and sin 0, so 
the basic trigonometric functions can be thought of as 
restrictions to the unit circle of certain polynomials. 

What are these polynomials? It turns out that they are 
harmonic and homogeneous. A harmonic polynomial 
p(x,y) is one that satisfies the laplace equation 
[1.3 §5.4] A p = 0, where A p stands for 
d 2 p d 2 p 
dx 2 + dy 2 ' 

For instance, if p(x,y) = x 2 - y 2 , then d 2 p/dx 2 = 2 
and d 2 p/dy 2 = -2, so x 2 - y 2 is, as we would hope, 
a harmonic polynomial. Since the Laplacian A is a lin- 
ear operator, the harmonic polynomials form a vector 
space. A homogeneous polynomial of degree n is one 
in which the total degree of each term is n, or equiv- 
alently a polynomial p(x,y ) such that p(Ax, \y) is 
always equal to A n p(x,y). For example, x 3 - 3 xy 2 
is homogeneous of degree 3 (and also harmonic). The 
homogeneous harmonic polynomials of degree n form 
a subspace of the space of all harmonic polynomials. It 
has dimension 1 when n = 0 and 2 when n > 0. (When 
n > 0 it corresponds to the space of functions of the 
form A cos nØ + B sin nø. The polynomial x 3 - 3 xy 2 , 
for instance, corresponds to the function cos 3d.) 

The notion of a harmonic polynomial generalizes 
very easily to higher dimensions. For example, in three 
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dimensions a harmonic polynomial is a polynomial 
plx,y,z) such that 

3x 2 pp 

A spherical harmonic of order n and dimension d is the 
restriction to the sphere Sd-i of a harmonic polynomial 
in d variables that is homogeneous of degree n. 

Here are some of the properties of spherical har- 
monics that make them particularly useful and closely 
analogous to the trigonometric polynomials on the cir- 
cle. We shall fix a dimension d and use the notation dp 
to denote Haar measure on the unit sphere S = Sd-i- 
Basically, this means that if / is an integrable function 
from S to R, then J s fix) dp is its average. 

(i) Orthogonality. If p and q are spherical harmon- 
ics of dimension d and different degrees, then 
l s p(x)q(x)dp = 0. 

(ii) Completeness. Every function / : S — • R that 
belongs to L 2 (S, p) (meaning that J s \f(x)\ 2 dp exists 
and is finite) can be written as a sum Xn=o (with 
convergence in L 2 (5,p)), where H n is a spherical 
harmonic of order n. 

(iii) Finite-dimensionality of decomposition. For each 
d and n, the vector space of spherical harmonics of 
dimension d and order n is finite dimensional. 

From these three properties it follows easily that 
L 2 ( S , p) has an orthonormal basis [III.37] consisting 
of spherical harmonics. 

Why are spherical harmonics natural, and why are 
they useful? Both questions can be given several 
answers: here is one for each. 

The Faplace operator A, which operates on functions 
defined on R n , can be generalized to functions defined 
on any riemannian manifold [1.3 §6.10] M. The gen- 
eralization, denoted Am, is called the Laplace-Beltrami 
operator for M, and its behavior gives one a great 
deal of information about the geometry of M. In par- 
ticular, the Faplace-Beltrami operator can be defined 
for the sphere Sd- 1, where it is called simply the Bel- 
trami operator. It turns out that the spherical harmon- 
ics are the eigenvectors [1.3 §4.3] of the Beltrami oper- 
ator. More precisely, a spherical harmonic of dimen- 
sion d and order n is an eigenvector with eigenvalue 
-n(n + d - 2). (Notice that the second derivative of 
cos nø is -n 2 cosnd, which corresponds to the case 
d = 2.) This gives an alternative, more natural (but less 
elementary) definition of spherical harmonics. This def- 
inition, combined with the faet that the Laplace opera- 


tor is self-adjoint, explains many of the important prop- 
erties of spherical harmonics. (See linear operators 
and their properties [III.52 §3] for an amplification 
of this remark.) 

One reason for the importance of Fourier analysis is 
that many important linear operators become diago- 
nal, and hence particularly easy to understand, when 
they are applied to the Fourier transform of a func- 
tion. For example, if / is a smooth periodic function 
and we write it as Znez a n e in0 , then its derivative is 
Xnez na n e ine . Writing fin) for the nth Fourier coeffi- 
cient of /, we deduce that f'(n ) = nfln), which tells 
us that to differentiate a function / all we have to do 
is multiply its Fourier transform pointwise by the func- 
tion g{n) = n. This provides a very useful technique 
for solving differential equations. 

As has already been mentioned, spherical harmon- 
ics are eigenvalues of the Laplacian, but they also diag- 
onalize several other linear operators. A good exam- 
ple is the spherical Radon transform, which is defined 
as follows. If / is a function from Sd- 1 to R, then its 
spherical Radon transform Rf is another function from 
Sd- 1 to R, and the value of Rf at a point x is the aver- 
age value of / over all points y that are orthogonal 
to x. This is closely related to the more usual Radon 
transform, which replaces a function defined on the 
plane by its averages over lines; inverting the Radon 
transform is important for creating images from the 
outputs of medical scanners. The spherical harmonics 
turn out to be eigenfunetions for the spherical Radon 
transform. More generally, any transform T of the form 
Tf (x) = j s w(x ■ y)fly) d ply), where w is a suitable 
function (or generalized function), is diagonalized by 
spherical harmonics. The eigenvalue associated with 
a given spherical harmonic can be calculated by the 
so-called Funk-Hecke formula. 

Spherical harmonics give a way of linking cheby- 
SHEV AND LEGENDRE POLYNOMIALS [III.87], and show- 
ing that both of them are natural concepts. The Cheby- 
shev polynomials are those polynomials in x that are 
also spherical harmonics of dimension 2: that is, that 
are equal on Si to homogeneous harmonic polynomials 
in two variables. For instance, because x 2 + y 2 = 1 for 
every (x,y) in the circle Si, the function x 3 - 3xy 2 
that we considered earlier is equal on Si to the func- 
tion 4x 3 - 3x, so 4x 3 - 3x is a Chebyshev polyno- 
mial. The Fegendre polynomials are those polynomials 
in x that are equal to spherical harmonics of dimen- 
sion 3. For example, if plx,y,z) = 2x 2 - y 2 — z 2 then 
A p = 0, and plx,y,z) = 3x 2 - 1 everywhere on S2, 
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since x 2 + y 2 + z 2 = 1. Therefore, 3x 2 - 1 is a Legendre 
polynomial. 

Here is a sketch of a proof that these polynomials 
are equal to the Chebyshev and Legendre polynomials 
as they are usually defined. The usual definition is that 
they are sequences of polynomials, one for each degree, 
that are uniquely determined by certain orthogonal- 
ity relations. Because spherical harmonics of different 
orders are orthogonal, the polynomials just described 
also satisfy certain orthogonality relations. When one 
works out what these are, one discovers that they are 
precisely the relations that dehne the Chebyshev and 
Legendre polynomials. 


III.90 Symplectic Manifolds 

Gabriel P. Patemain 


Symplectic geometry is the geometry that governs clas- 
sical physics, and more generally plays an important 
role in helping us to understand the actions of groups 
on manifolds. It shares some features with Riemannian 
geometry and complex geometry, and there is an impor- 
tant special class of manifolds, the Kåhler manifolds, in 
which all three geometric structures are unified. 

1 Symplectic Linear Algebra 

Just as riemannian geometry [1.3 §6.10] is based 
on euclidean geometry [1.3 §6.2], symplectic geom- 
etry is based on the geometry of the so-called linear 
symplectic space (R 2n , coo). 

Given two vectors v = (q,p) and v' = (q',p'} 
in the plane R 2 , the signed area wq(v,v') of the 
parallelogram spanned by v and v' is given by the 
formula 

wo(v,v') = det < ^j=pq'-qp'. 

It can also be writtenusing matrices and inner products 
as coo(v, v') = v' ■ Jv, where J is the 2 x 2 matrix 



If a linear transformation A : R 2 -» R 2 is area preserv- 
ing and orientation preserving, then wo (Av, Av') = 
cvo(v,v') for every v and v' . 

Symplectic geometry studies two-dimensional signed 
area measurements Uke this, as well as transformations 
that preserve these measurements, but it applies to 
general spaces of dimension 2n rather than just to the 
plane. 


If we split R 2n up as R n x R n , then we can write a vec- 
tor v inR 2n as v = (q, p), where q and p eachbelong to 
R n . The standard symplectic form u>o : R 2n x R 2 ™ — R 
is defined by the formula 

u> 0 (v,v') = p ■ q' -q-p', 

where denotes the usual inner product in R”. Geo- 
metrically, cuo(v,v') canbe interpreted as the sum of 
the signed areas of the parallelograms spanned by the 
projections of v and v' to the q;p;-planes. In terms of 
matrices, we can write 

coo(v,v') = v' ■ Jv, (1) 

where J is the 2 n x 2 n matrix 

.0 

and I is the n x n identity matrix. 

A linear map A : R 2n — R 2n that preserves the prod- 
uct coo of any two vectors (that is, wq(Av, Av') = 
cuo(v,v') for all v,v' e R 2n ) is called a symplectic 
linear transformation ; equivalently, a 2n x 2n matrix 
A is symplectic if and only if A J JA = J, where A J 
is the transpose of A. Symplectic linear transforma- 
tions are to symplectic geometry as rigid motions are 
to Euclidean geometry. The set of all symplectic linear 
transformations of (R 2n , u>o) is one of the classical lie 
groups [III.50 § 1] and is denoted by Sp(2n). One can 
show that symplectic matrices A e Sp(2n) always have 
determinant [III. 15] 1, and are thus volume preserv- 
ing. However, the converse does not hold when n ^ 2. 
For instance, if n = 2, the linear map 

(qi,q 2 .Pi.p 2 ) - (aq 1 ,q 2 /a,ap 1 ,p 2 /a) 
has determinant 1 for any a * 0, but it is symplectic 
only if a 2 = 1. 

The standard symplectic form w o has three proper- 
ties worth noting. First, it is bilinear: the expression 
cuo(v,v') varies linearly in v when v' is held fixed, 
and vice versa. Second, it is antisymmetric: we have 
u>o(v,v') = -ooo(v',v) for all v andu', andinpartic- 
ular a>o(v,v) = 0. Finally, it is nondegenerate, which 
means that for every nonzero v there is a nonzero 
v' such that ooo(v,v') * 0. The standard symplectic 
form wo is not the only form that obeys these three 
properties; however, it turns out that any form with 
these three properties can be converted into the stan- 
dard form wo after an invertible linear change of vari- 
ables. (This is a special case of Darboux’s theorem.) 
Thus (R 2n , coo) is essentially the “only” linear symplec- 
tic geometry in 2 n dimensions. There are no symplectic 
forms in odd-dimensional spaces. 
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2 Symplectic Diffeomorphisms of (R 2n , w o) 

In Euclidean geometry, all rigid motions are automat- 
ically linear (or affine) transformations. However, in 
symplectic geometry there are many more symplec- 
tic maps than just the symplectic linear transforma- 
tions. These nonlinear symplectic maps in (R 2n ,coo) 
are one of the principal objects of study in symplectic 
geometry. 

Let U c R 2 " be an open set. Recall that a map <f : 
U R 2n is called smooth if it has continuous partial 
derivatives of all orders. A diffeomorphism is a smooth 
map with smooth inverse. 

A smooth nonlinear map <f : U — R 2n is said to 
be symplectic if, for every x e U, the Jacobian matrix 
4>'{x) of first derivatives of 4> is a symplectic linear 
transformation. Informally, a symplectic map is one 
that behaves like a symplectic linear transformation 
at infinitesimally small scales. Since symplectic linear 
transformations have determinant 1, we can conclude 
using several-variable calculus that a symplectic map 
is always locally volume preserving and locally invert- 
ible; roughly speaking, this means that the map 4> : 
A — <f(A) is invertible whenever A is a sufficiently 
small subset of U, and <f(A) has the same volume as 
A. However, the converse is not true when n ^ 2; the 
class of symplectic maps is much more restricted than 
that of volume-preserving maps. In faet, Gromov’s non- 
squeezing theorem (see below) shows how striking this 
difference can be. 

Symplectic maps have been around for quite a long 
time in Hamiltonian mechanics under the name of 
canonical transformations. We briefly explain this in the 
next subsection. 


2.1 Hamilton’s Equations 


How can we produce nonlinear symplectic maps? Let 
us begin by exploring a familiar example. Consider the 
motion of a simple pendulum with length l and mass 
m and let q(t) be the angle it makes with the vertical 
at time t. The equation of motion is 


d 2 q 

dt 2 


j sinq = 0, 


where g is the acceleration due to gravity. If we define 
the momentum p as p = ml 2 q, then we may trans- 
form this second-order differential equation into a 
first-order system in the phase plane R 2 , namely 

^-(q,p) = X(q,p), (3) 


where the vector field X : R 2 — R 2 is given by 
the formula X(q,p) = (p/ml 2 , -mgl sin q). For each 
(q(0),p(0)) g R 2 there is aunique solution (q(t), p(t)) 
to (3) with initial condition (q(0), p( 0)). Then for any 
fixed time t we obtain an evolution map (or flow) (f t : 
R 2 - R 2 given by (ftlqi 0),p(0)) = (q(t), p(t)), which 
has the remarkable property of being area preserving. 
This can be deduced from the observation that X is 
divergence free, or in other words that 


d p 
dq ml 2 


d_ 
d p 


{-mgl sinq) = 0. 


In faet, for every time t, (ft is a symplectic map on 
(R 2 , coo). 

More generally, any system in classical mechanics 
with finitely many degrees of freedom can be refor- 
mulated in a similar fashion, so that the evolution 
maps (ft are always symplectic maps; in this con- 
text, they are also known as canonical transformations. 
The Irish mathematician william rowan hamilton 
[VI. 3 7] showed us how to do this in general more than 
170 years ago. Given any smooth funetion H : R 2 ™ — 
R (called the Hamiltonian), the system of first-order 
differential equations given by 


dqt = dH_ 
dt dpi’ 1 
dpi _ _m . _ 
dt dqi ' 1 


(4) 

(5) 


will (under some mild growth assumptions on H, which 
we ignore here) give rise to evolution operators (ft : 
R 2n — R 2 ”, which are symplectic maps on (R 2n , coo) for 
every time t. To see the connection with the symplectic 
form u>o, observe that we may rewrite (4) and (5) in the 
following equivalent form: 


— = JVH(x), (6) 

dt 

where VH is the usual gradient [1.3 §5.3] of H and J 
was defined in (2). From (6), (1), and the antisymmetry 
property of coo, it is then not difficult to verify that 
4>t is a diffeomorphism for every t (the main trick is 
to compute the derivative of voo((ft{x)v,(f' t (x)v / ) in 
t and check that it equals zero). 

We have already pointed out that symplectic maps 
are volume preserving. The preservation of volume by 
Hamiltonian systems (a result known as Liouville’s theo- 
rem) attracted considerable attention in the nineteenth 
century and it was a driving force in the development 
of ergodic theory [V. 11], which studies recurrence 
properties of measure-preserving transformations. 
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Symplectic maps or canonical transformations play 
an important role in classical physics, as they allow 
one to replace a complicated system by an equivalent 
system that is simpler to analyze. 

2.2 Gromov’s Nonsqueezing Theorem 

What is the difference between a symplectic map and a 
volume-preserving map? In order to answer this ques- 
tion, suppose that we have two connected open sets U 
and V in R 2 " and that we wish to embed one into the 
other using a symplectic map. This means that we are 
looking for a symplectic map 4> : U — V such that <£ 
is a homeomorphism onto its image. We know such a 
< p must be volume preserving, so we clearly have the 
restriction that the volume of U should be at most 
the volume of V, but is this restriction all that mat- 
ters? Consider the open ball B(R) = {x e R 2n : \x\ < 
R}, which has radius R and center at the origin, and 
clearly has finite volume. It is not hard to embed it 
symplectically into the infinite-volurne cylinder given 
by 

C(r) = {(q,p)&W 2n :q 2 + q 2 <r 2 } 
ly for any positive R and r. Indeed, the linear symplectic 
map 

(q,P) ~ (aqi,aq2,q3,-..,qn,pi/a,p 2 /a,p3,...,Pn) 
will do the trick when a is sufficiently small and pos- 
itive. However, the situation is radically different if 
instead we consider the infinite-volurne cylinder 
Z(r) = {(q, p) G R 2 ™ : q\ + p\ < r 2 }. 

We could try with a similar linear map like 
( q,P ) ~ (aqi,q 2 /a,q3,...,qn,a.pi,p2/a,p3,...,pn). 
This map is volume preserving (it has determinant 1) 
and for a small it embeds B(R) into Z(r). However, it 
is symplectic only if a = 1, so it will give a symplectic 
embedding only if R ^ r. One is tempted to think that if 
R > r, then there should still be a nonlinear symplectic 
embedding squeezing B(R) into Z(r), but a remarkable 
theorem of Gromov from 1985 asserts that it is not 
possible to find such a map. 

In spite of this deep result of Gromov, and other 
results that followed it, we still do not know much 
about how sets in R 2 ™ embed into one another. 

3 Symplectic Manifolds 

Recall from differential topology [IV.9] that a man- 
ifold of dimension d is a topological space [III.92] 


such that each point has a neighborhood that is home- 
omorphic to an open set in Euclidean space R d . One 
can think of R d as a local model for this manifold, in 
the sense that it describes what the manifold looks 
like at very small distance scales. Recall also that a 
smooth manifold is one for which the “transition func- 
tions” are smooth. This means that if : U — R d and 
ep : V -> R d are coordinate charts, then the transition 
funetion i p ° q? _1 between the open sets 4>{U n V) and 
ip(U nV) is smooth. 

A symplectic manifold is defined similarly, but now 
the local model is the linear symplectic space (R 2n , coo ) ■ 
More precisely, a symplectic manifold M is a manifold 
of dimension 2n that can be covered with domains 
of coordinate charts whose transition funetions are 
symplectic diffeomorphisms of (R 2 ™, coo). 

Of course, any open set in (R 2n , coo) is a symplectic 
manifold. An example of a compact symplectic man- 
ifold is the torus T 2 ™, which is obtained as the quo- 
tient of R 2 " by the action of Z 2n . In other words, two 
points x,y g r 2 ™ are equivalent if' x - y has inte- 
ger coordinates. Other important examples of symplec- 
tic manifolds include riemann surfaces [III.81], com- 
plex projective space [III. 74], and cotangent bundles 
[IV.10 §5]. However, it is a wide open problem to deter- 
mine, given a compact manifold, whether it can be 
assigned a system of coordinate charts that makes it 
symplectic. 

We have seen that in (R 2n ,coo), one can assign an 
“area” a>o(v,v') to any parallelogram in the space R 2n . 
In a symplectic manifold M, one can similarly assign an 
area w p (v,v'), but only to infinitesimal parallelograms 
based at a point p g M. The axes of such a parallelo- 
gram are two infinitesimal vectors (or more precisely 
tangent vectors) v and v'. There is a unique way to do 
this so that all the coordinate charts for M are sym- 
plectic diffeomorphisms. In the language of differen- 
tial forms [III. 16], the map p >- oj p is an antisym- 
metric nondegenerate 2-form, which can then be used 
to compute the “area” J s c o of noninfinite simal two- 
dimensional surfaces S in M. One can show that for 
any sufficiently small closed surface S, the integral J s to 
vanishes, so w is a closed form. Indeed, one can define 
a symplectic manifold more abstraetly (without refer- 
ence to charts) as a smooth manifold equipped with a 
closed, antisymmetric nondegenerate 2-form co; a clas- 
sical theorem of Darboux asserts that this abstract def- 
inition is equivalent to the more concrete definition 
using coordinate charts. 
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Finally, a special class of symplectic manifolds is 
given by Kåhler manifolds. These are symplectic mani- 
folds that are also complex manifolds, in such a way that 
the two structures are naturally compatible, a condition 
that generalizes the relationship (1). Observe that if one 
identifies points (q, p ) in R 2n with points p + iq in C n , 
then the linear transformation J : Wt 2n — R 2n becomes 
the operation of multiplication by i: 

J : (zi,...,z n ) (izi,...,iz„). 

Thus the identity (1) relates the symplectic structure 
(as given by coo), the complex structure (as given by J), 
and the Riemannian structure (as given by the dot prod- 
uct “ ■ ”). A complex manifold is a manifold that at small 
distance scales looks like regions of C”, with the tran- 
sition functions required to be holomorphic [1.3 §5.6]. 
(A smooth map / : U c C” — C™ is said to be holomor- 
phic if each coordinate component of f(zi,...,z n ) is 
holomorphic in each variable Zfc.) On a complex mani- 
fold we can multiply tangent vectors by i. This gives us 
at each point p g M a linear map ] p such that J 2 u = -v 
for all tangent vectors v at p. A Kåhler manifold is 
a complex manifold M with a symplectic structure co 
(which computes signed areas of infimtesimal parallel- 
ograms) and a Riemannian metric g (which computes 
an inner product g p (v, v') of any two tangent vectors 
v, v' at p)\ these two structures are linked together by 
the analogue of (1), namely 

iv p (v,v r ) = g v (v',J p v). 

Examples of Kåhler manifolds include complex vector 
spaces C n , Riemann surfaces, and complex projective 
spaces CR". 

An example of a compact symplectic manifold that is 
not Kåhler can be obtained by taking the quotient of R 4 
by a symplectic action of a group that looks like z 4 but 
with a group operation that differs from the usual one. 
The change in the group structure manifests itself as 
a topological property (an odd first Betti number) that 
prevents the quotient being Kåhler. 

Further Reading 

Arnold, V. I. 1989. Mathematical Methods of Classical 
Mechanics, 2nd edn. Graduate Texts in Mathematics, vol- 
ume 60. New York: Springer. 

McDuff, D., and D. Salamon. 1998. Introduction to Symplec- 
tic Topology, 2nd edn. Oxford Mathematical Monographs. 
Oxford: Clarendon Press/Oxford University Press. 
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If U, V, and W are vector spaces [1.3 §2.3] over some 
held, then a bilinear map from U x V to W is a map (f> 
obeying the rules 

<t>(Au + pu',v) = A4>(u,v) + p4>(u',v) 

and 

</>(m, Av + pv') = A 4>(u,v) + p4>(u,v'). 

That is, it is linear in each variable separately. 

Many important maps, such as inner Products 
[III.37], are bilinear. The tensor product U ø V of two 
vector spaces U and V is a way of capturing the idea 
of the “most general” bilinear map that we can define 
on U x V. To get an idea of what this might mean, 
let us try to imagine a “completely arbitrary” bilin- 
ear map from U x V to a “completely arbitrary” vector 
space W, and let us use the notation u ø v instead of 
<f(u,v). Now because our linear map is perfectly gen- 
eral, all we know about it is what we can deduce from 
the faet that it is bilinear. For example, we know that 
u®vi + u®V2 = uø (f] +V2 >. This example might sug- 
gest that all elements of U ø V are of the form n ø v, 
but that is certainly not the case: for instance, in gen- 
eral there is no way of simplifying an expression such 
as ui ø vi + U2 ø V2- (This reflects the faet that the set 
of values taken by a bilinear map from U x V to W is 
not in general a subspace of W.) 

Thus, a typical element of U ø V is a linear combina- 
tion of elements of the form u ø v , with the rule that 
different linear combinations give the same element of 
U ®V whenever they are forced to by the bilinearity 
property: for instance, (tti + 2tt2 ) ® (ui - V2 ) will always 
be equal to 

Ml ® Vi + 2U2 0 Ml - Ml 0 t>2 - 2M2 0 V2- 

A more formal way of expressing the above ideas is 
to say that U ø V has a universal property. (See groups 
and geometry [TV. 11] for some other examples of 
universal properties. See also categories [III.8].) The 
property in question is the following: given any bilinear 
map <p from U x V to a space W, we can find a linear 
map a from U ø V to W such that <p(u,v) = a(u ø v) 
for every u and v . That is, every bilinear map defined 
on U x V is naturally associated with a linear map 
defined on U ø V. (This linear map takes Køt to 
cf(u,v): the identifications made in the definition of 
the tensor product ensure that we can extend this to 
linear combinations of such elements in a consistent 
way.) 

It is not hard to show that if U and V are finite dimen- 
sional, with bases Mi u m and v\,...,v n , then the 
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vectors u, ø vj form a basis for U ø V. Other important 
properties of the tensor product are that it is commu- 
tative and associative, in the sense that U ø V is natu- 
rally isomorphic to V ø U and U ø (V ø W) is naturally 
isomorphic to (U ø V) ø W. 

We have been discussing tensor products of vector 
spaces, but the definition can easily be generalized to 
any algebraic structure for which some notion of bilin- 
earity makes sense, such as a module [III.83 §3] or a 
C* -algebra [IV. 19 §3]. Sometimes the tensor product 
of two structures is not what you would immediately 
expect. For instance, let z n be the set of integers mod N, 
and consider both Z„ and Q as modules over z. Then 
their tensor product is zero. This reflects the faet that 
every bilinear map from Z n x O must be the zero map. 

Tensor products occur in many mathematical con- 
texts. For a good example, see Quantum groups 
[III. 77], 


Transcendental Numbers 

See IRRATIONAL AND TRANSCENDENTAL 
NUMBERS [III.43] 


III.92 Topological Spaces 

Ben Green 


A topological space is the most basic context in which 
one can understand the notion of a continuous func- 
tion [1.3 §5.2]. 

Let us recall a standard definition of what it means 
for a funetion / : R — R to be continuous. Suppose 
that f(x ) = y. Then / is continuous at % provided 
that f(x') is close to y whenever x' is close to x. Of 
course, to make this a mathematically rigorous notion 
we have to be precise about the meaning of “close.” We 
could say that f(x') is close to y if | f(x')-f(x) \ < e, 
where £ > 0 is some small positive constant. And we 
could deem x to be close to x' whenever |x — x'\ <6, 
where 6 is another positive constant. 

We say that / is continuous at x if an appropriate 6 
can be found, regardless of how small e was chosen to 
be (5 is allowed, of course, to depend on e). And / is 
said to be continuous if it is continuous at every point 
x on the real line. 

How might we generalize this notion, replacing R 
by an arbitrary set X ? Our existing definition makes 
sense only if we can decide when two points x, x' e X 
are close. For a general set, which might not be nicely 


embedded in Euclidean space, this is impossible with- 
out the addition of further structure. (When such struc- 
ture is added one has the notion of a metric space 
[III. 58]: metric spaces are less general than topological 
spaces.) 

If the notion of closeness is unavailable, how should 
one define continuity? The answer may be found in the 
notion of an open set. A set U c lis said to be open 
if for any point x in U there is an interval ( a , b) that 
contains x (that is ,a<x<u) and is contained in U. 

It is an amusing exercise to check that if / : R — R 
is continuous, and if U is open, then f~ l (U) is open. 
Conversely, if / _1 ([/) is open for every open set U, then 
/ is continuous. Thus, at least for funetions from R to 
R, one may characterize continuity purely in terms of 
open sets. The notion of closeness is used only when it 
comes to defining what an open set is. 

We now turn to the formal definition. A topological 
space is a set X together with a collection IL of subsets 
of X (called the “open sets”) satisfying the following 
axioms. 

• The empty set 0 and the set X are both open. 

• V. is closed under taking arbitrary unions (so if 
(Ui)iGi is a collection of open sets, then so is 

Uie7 Ui). 

• U is closed under taking finite intersections (so if 
t/i, . . . , Uk are open sets, then so is U\ n ■ ■ ■ n Uk )■ 

The collection IL is called a topology on X. It is easy 
to verify that the usual open subsets of R satisfy the 
above axioms: thus, R forms a topological space with 
these sets. 

A subset of a topological space is called closed if and 
only if its complement is open. Note that “closed” does 
not mean “not open”: for example, in the space R, the 
half-open interval [0, 1 ) is neither open nor closed, and 
the empty set is both open and closed. 

Note that we do not demand many properties from 
our open sets: this makes the notion of topological 
space a rather general one. Indeed, under many circum- 
stances the concept is a little too general: then it can be 
convenient to assume that a topological space has fur- 
ther properties. For instance, a topological space X is 
called Hausdorff if, for any two distinet points Xi and 
X2 in X, there are disjoint open sets U\ and U-i that 
contain xi and X2, respectively. Hausdorff topological 
spaces (of which R is an obvious example) have many 
useful properties that general topological spaces do not 
necessarily have. 
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We saw earlier that for functions from 1 to 1 the 
notion of continuity could be formulated entirely in 
terms of open sets. This means that we can define con- 
tinuity for functions between topological spaces: if X 
and Y are two topological spaces and if f ; X — Y 
is a function between them, then we simply define / 
to be continuous if / _1 (1/) is open for every open set 
U c Y. Remarkably, we have found a useful definition 
of continuity that does not rely on a notion of distance. 

A continuous map that has a continuous inverse is 
known as a homeomorphism. If there is a homeomor- 
phism between two spaces X and Y, then they are 
regarded as equivalent from the point of view of topol- 
ogy. In topology texts one will often see it said that a 
topologist is unable to distinguish between a dough- 
nut and a teacup because each can be continuously 
deformed into the other (imagine that they are both 
made of modeling clay). 

If X is a topological space, then a very useful way of 
describing the topology on X is by giving a basis for it. 
This is a subcollection S s IL with the property that 
every open set (that is, every element of IL) is a union 
of open sets in S. A basis for R with the usual topology 
is the collection of open intervals {{a,b) : a < b}, and 
a basis for R 2 is the collection of open balis : that is, sets 
of the form {B s (x) = {y : \x-y\ < 5}}. 

Let us give some examples. 

The discrete topology. Let X be any set whatsoever, and 
take IL to be the collection of all subsets of X. It is a 
simple matter to check that the axioms for a topological 
space are satisfied. 

Euclidean spaces. Let X = R d , and let T L contain all sets 
that are open in the Euclidean metric. That is, U ^ X 
is open if, for every u e U, there is 5 > 0 such that 
Bg(u) is contained in U. It is only slightly more taxing 
to check that the axioms are satisfied in this case. More 
generally, for any metric space the open sets can be 
defined in a similar way and they form a topological 
space. 

Subspace topology. If X is a topological space and if 
S ^ x, then we may make S a topological space. We 
declare the open sets in S to be all sets of the form 
S n U, where U e 11 is an open set in X. 

The Zariski topology. This is used in algebraic geom- 
etry [IV.7]. It is specified by giving its closed sets (and 
hence, by complementation, its open sets) — these are 
the zero loci of systems of polynomial equations. On 


C 2 , for example, these closed sets are precisely the sets 
of the form 

{(zi,z 2 ) :/i(zi,z 2 ) =/ 2 (zi,z 2 ) 

= ■ ■ ■ = /fc(zi,z 2 ) = 0}, 
where are polynomials. To show that this 

defines a topology is somewhat nontrivial, the diffi- 
culty being to show that an arbitrary intersection of 
closed sets is closed (which is equivalent to the asser- 
tion that an arbitrary union of open sets is open). This 
is a consequence of Hilbert’s basis theorem. 

The notion of topological space is a very good exam- 
ple of the power of abstraction in mathematics. The 
definition is simple and covers a wide variety of nat- 
ural situations, yet it has enough content that one 
can make interesting definitions and prove theorems 
purely within the world of topological spaces. It is often 
fim to take a familiar concept, that applies to R or R 2 , 
say, and try to find an analogue of it in the world of 
general topological spaces. We give two examples. 

Connectedness. The rough idea of connectedness is 
that a connected set is one that does not break up into 
pieces in an obvious way. Most people would imagine 
that they could discern, from a list of pictures of rea- 
sonably sensible subsets of R 2 , which were connected 
and which were not. But can one give a precise math- 
ematical definition that applies to all sets, including 
potentially very wild ones, and says whether they are 
connected or not? For example, is the space 
S=((QXR)U(RXQ))\(QXQ) 
of lines with exactly one rational coordinate (with the 
subspace topology) connected or not? It turns out that 
a definition can indeed be given, and moreover that it 
applies not just to R 2 but to general topological spaces. 
We say that a space X is connected if there is no decom- 
position X = Ui u l/ 2 of X into two disjoint, nonempty, 
open sets. We leave it to the reader to decide whether 
S is connected or not. 

Compactness. This is one of the most important con- 
cepts in all of mathematics, but it can appear strange 
at first sight. It comes from attempting to abstract the 
notion of a closed and bounded set (in R 2 , say) to a 
general topological space. We say that X is compact if, 
given any collection C of open sets U that cover X (i.e., 
whose union is X), we may find a finite subcollection 
{ C/i , . . . , Uk) ^ C that still covers X. Specializing this 
definition to R 2 with the usual topology, it can indeed 
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be proved that a set 5 s E 2 is compact (in the subspace 
topology) if and only if it is closed and bounded. See 
COMPACTNESS AND COMPACTIFICATION [III.9] for more 
information. 


III.93 Transforms 

T. W. Korner 


If we have a finite sequence ao, ai, . . . , a n of real num- 
bers (written briefly as a), then we can look at the 
polynomial 

P a (t ) = ao + ait + ■ ■ ■ + a„t n . 

Conversely, given a polynomial Q of degree m^n, we 
can recover a unique sequence bo,bi,...,b n such that 
P b (t) = b 0 + bit + ■ ■ ■ + b n t n 
by, for example, taking bk = Q_ tk> (0 ) /fe!. 

We observe that if ao,ai a n and bo, bi, . . . , b n 

are finite sequences with a r = b r = 0 for r> \n, 
then 

Pa(t)Pb(t) — P a *b(t), 

where a* b = c is a sequence Co, ci , . . . , c->n given by 
Ck = «o bk + a-ibk-i + ■ ■ ■ + akb 0 , 
where we interpret ai and b t as 0 if i > n. This sequence 
is called the convolution of the sequences a and b. 

To see the kind of use that one can make of this 
observation, consider what happens when we throw 
two dice, the first of which has probability a u of show- 
ing tt and the second of which has probability b v of 
showing v. The probability that their sum is fc is given 
by the number cg defined above. If we take both a u and 
b(u) to be the probability of throwing tt with an ordi- 
nary fair die (so they are equal to g if 1 ^ ti ^ 6, and 0 
otherwise), then 

P c (t) = Pa(t)Pb(t) 

= (g(t + t 2 +■<»■* + t 6 )) 2 . 

This polynomial can be rewritten as 
i, (tit + 1 )(t 4 + t 2 + 1 ))(t(t 2 + t + 1 Mt 3 + D) 

= 36 (t(t + l)(t 2 + t + 1 ))(t(t 4 + t 2 + 1 )(t 3 + 1)) 
= P A (t)P B (t), 

where A and B are two different sequences, given by 
Ai = A4 = g, A2 = A3 = §, and A u = 0 otherwise, 
and Bi = B 3 = B 4 = Bs = B e = B s = and B v = 0 
otherwise. Thus, if we take two fair dice A and B and 
number A so that it has 2 on two faces, 3 on two faces, 
1 on one face, and 4 on the remaining face, and we 


number B so that it has 1, 3, 4, 5, 6, and 8 on its faces, 
then the probability of throwing a sum k is the same as 
with an ordinary pair of dice. It is not hard to show, by 
considering the roots of the polynomial t + t 2 + ■ ■ ■ + 1 6 , 
that this is the only nonstandard labeling of dice with 
strictly positive integers that has this property. 

These general ideas are easily extended to infinite 
sequences. If a is the sequence ao, ai , . . . , we can define 
an “infinite polynomial” (Qa)(t) to be 2”=o a r t r . For 
the moment, we shall proceed formally, without worry- 
ing in what sense the sum exists. Observe that, much 
as before, 

{tja)(t)(tjb)(t) » (ij(a*b))(t) : , 
where the infmite sequence c = a* b is given by 
c k = a 0 b k + aibk-i + ■ ■ ■ + a k b 0 . 

(Again, we call this the convolution of a and b.) 

There is a well-known problem in which we are asked 
how many ways there are of making change for r units 
of currency using notes of given denominations. (For 
example, we can ask how many ways there are of mak- 
ing $43 out of $1 and $5 bilis.) If we can make r units 
in Or ways using one set of denominations and b r ways 
using a completely different set, then it is not hard to 
see that, if we are allowed to use both sets of denomi- 
nations, we can make up k units in et ways, where c g 
is again the number defined earlier. 

Let us see how this applies in the simple case where 
a r is the number of ways of making up r dollars using 
$1 bilis and b r is the number of ways of making up 
r dollars using $2 bilis. We observe that 

(Ga)(t) = X t r = - 4 -, 

r=0 1 l 

(Cjb)(t) = tt 2r ~ 1^2, 
and so, using partial fractions, 

(Gc)(t) = (§(a * b))(t) = (§a)(t)(§b)(t) 

1 1 
~ (1 0(1 t 1 ) ~ U M'il • t) 

1 1 1 

4(1 + t) 4(1 -t) 

-|f( r + 1),r+ if 0 ( - 1) ' ,r -4j 0 ,r 


Thus we can make change for r dollars in \ (r + 1 ) ways 
when r is odd and j(r + 2) ways when r is even. In this 
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simple case it is easy to obtain the result directly but the 
method indicated works automatically in all cases. (The 
calculations canbe made easier if we allow ourselves to 
work with complex roots.) 

We have produced a “generating function transform” 
or “(j-tr ansform,” which takes a sequence ao,ai,... 
into a Taylor series X”=o u r x r . (These names are 
not standard: most mathematicians would simply talk 
about GENERATING FUNCTIONS [IV. 2 2 §§2.4, 3].) The 
next two examples show how we can use (j-tr ansforms 
to restate problems about sequences as problems about 
Taylor series. Consider first the problem of finding a 
sequence u n such that uo = 0, u\ = 1, and 
u n + 2 - 5n n+ i + 6u n = 0 
for all n ^ 0. Observe that we must have 

U n + 2 t n+2 ~ 5u n+ it n+2 + 6u n t n+2 = 0 


for all n ^ 0, so that summing over all n is 0 yields 
((Gu)(t)- Ul t-Uo)-5(t{Gu)(t)-u 0 )+6t 2 (Gu)(t) = 0. 
Recalling that Uq = 0, Ui = 1, and rearranging, we 
obtain 


(6t 2 -5t+l)(§u)(t) = t. 


Thus, using partial fractions, we obtain 


(£u)(t) 


t t 

6t 2 - 5t + 1 ” (l-2t)(l-3t) 
-1 1 
1 - 2t + 1 - 3t 


= -X(2tr+ lotr 

r = 0 r = 0 

= X(3 r -2 r )t r . 

r = 0 

It follows that u r = 3 r - 2 r . 

Next consider the rather trivial problem of finding 
sequence u n such that uq = 1 and 


(n+ l)u n +i +u n = 0 


for all n ^ 0. For every t we have 

(n + l)u n+ it n + u n t n = 0 


and so, summing over all n and assuming that the usual 
laws of differentiation apply to infinite sums, we obtain 
(Oju)'(t) i (Oju)(t) -0. 

This differential equation gives (<gu)(t) = Ae~ l for 
some constant A. Setting t = 0, we obtain 
1 = uo = (@u)( 0) = Ae 0 = A. 


Thus 

(§u)(t) = e~ t = ^V, 

r = 0 r ' 

so u r = (-l) r /rl. 

We can write down some of the correspondences 
between sequences and their (J -transforms: 

(a 0 ,ai,U2,...) ” (§a)(t), 

(a 0 + b 0 ,ai + bi,a 2 + b 2 ,...) «— (§a)(t) + (Qb)(t), 
a* b - — - (Qa)(t)(Qb)(t), 
(0,a 0 ,ai,a 2 ,...) — t(§a)(t), 
(ai,2a 2 ,3a 2 ,...) — (@a)'(t). 

It is also important that we can recover the sequence a 
from its (j-transform. One way of seeing this is to note 
that 

_ (gfl) (r >( 0) 
r r! 

We can use these rules, as in the examples above, 
to convert problems about sequences into problems 
about functions and vice versa. In textbooks and exam- 
inations, the effect of such a transformation is to make 
things simpler. In real life, it will usually convert your 
problem into a more complicated problem. However, 
occasionally you strike lucky and it is these occasions 
that make transforms such a valuable weapon in the 
mathematician’s armory. 

Up to now we have håndled (j-transforms formally. 
However, if we wish to use the methods of analysis, 
we need to know that Xr=o a r t r converges, at least 
when 1 1| is small. Provided that the a r do not increase 
too rapidly, this will always be the case. However, we 
run into difficulties when we try to extend our ideas to 
“two-sided sequences” (a r ), where r runs through all 
integers rather than just the positive ones, and to the 
resulting sums X “=-~ a r t r . If |t| is small, then |t r | is 
large when r is large and negative, while if |t| is large, 
then | t r | is large when r is large and positive. In many 
cases, the best we can hope for is that Zr=o a r f might 
converge when t = -1 and t = 1. It is not very useful 
to talk about functions defined at only two points, but 
we save the situation by moving from R to C. 

If we have a well-behaved sequence (a r ) of com- 
plex numbers where r runs through all integers, then 
we consider the sum Xr=-~ a r z r , where the complex 
number z belongs to the unit circle (or, in other words, 
|z| = 1). Since any such z canbe written 


with 9 g R, it is more usual to talk about the 2 tt- 
periodic function Xr=- ~ a r e' r0 . We thus have the 
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“Fourier series transform” (once again, the name is 
nonstandard) Hf given by 

(Hfa)(0) = £ a r é rø . 

The Hf -transform takes a two-sided sequence a to 
a 2TT-periodic complex-valued function / = Hf a on 
the real line, but historically mathematicians have been 
more interested in reversing the process and obtaining 
a from /. If 

f(0) = X are lrø , 
then, arguing formally, 

^ fj(0)e- lkø 00 = ^1^ £ are^-W dø 

= £ g-J"’ e 1(r_fc)0 dd 

= Y. cos(r - k)0 + isin(r - k)ddd = a k . 

If we write 

then we obtain the celebrated Fourier sum formula 

f(0)= £ f(r)é rø . (1) 

dirichlet [VI.36] proved that this formula holds in 
its natural interpretation for reasonably well-behaved 
functions, but the question of the appropriate inter- 
pretation and proof for wider classes of functions took 
much longer to settie (see carleson’s theorem [V.5]). 
Aspects of the question are still open today. 

It is worth noting that we can obtain qualitative infor- 
mation about a sequence from its -transform and 
vice versa without explicit calculation. For example, if 
a r r m+ 3 forms a bounded sequence, then the rules for 
term-by-term differentiation show that Hf ais continu- 
ously m times differentiable, and if / is m times con- 
tinuously differentiable, then repeated integration by 
parts shows that the numbers r m f(r ) form a bounded 
sequence. 

Suppose that / represents a signal fed into a “black 
box,” such as a telephone system, which gives rise to 
a resultant signal Tf. Many important black boxes in 
physics and engineering have the “inftnite linearity” 
property that 

T( £ CrHr)(0)= £ C r Tg r {0) 


for all well-behaved function g r and constants c r . Many 
such systems also have the key property that 
Te k (0) = y k e k (ø) 

for some constant y k , where we have written e k {0) for 
the quantity e~ lkø . In other words, the functions e k are 
eigenfunctions [1.3 §4.3] for T. We can use the Fourier 
sum formula to obtain the formula 

Tf(.0) = [ £ f(r)Te r yø ) 

= S yrf(r)e r (0). 

In this context, it makes sense to think of / as the 
weighted sum of simple signals e k of frequency k. 

Mathematicians are always interested to see what 
happens if sums are replaced by integrals. In this case 
we obtain the classical Fourier transform. If F is a rea- 
sonably well-behaved function F : R — C, then we 
define its Fourier transform fF by the formula 

JF(A) = ^F(t)e~ iÅs ås. 

Much of the analysis that is typically taught in the first 
year or two of a university mathematics course was 
developed in the context of this transform and related 
topics. Using that analysis, it is not hard to obtain the 
correspondences 

F(t) — (fF)( A), 

F{t) + G(t) — (fF)( A) + (JG)(A), 

F*G(t) — (JT)(A)(JG)(A), 

F(t + u ) — e-‘“ A (JF)(A), 

F'(t) — iAJF(A). 

In this context we define the convolution of F and G by 
F*G(t) = J F(t-5)G(s)ds. 

There is an element of truth in the saying that the 
importance of the Fourier transform is that it converts 
convolution to multiplication and the importance of 
convolution is that it is the operation that is trans- 
formed to multiplication by the Fourier transform. Just 
as we can use the (J-transform to solve difference equa- 
tions, we can use the J -transform to solve important 
classes of partial differential equations [1.3 §5.4] 
that occur in physics and some parts of probability 
theory. For more on the Fourier transform, see [III.27]. 

By rescaling the Fourier sum formula (1), we obtain 
the formula 

F(t)= £ f Flsle-^d«^ 

2t tn J —ttn 
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when |t| < ttJV. If we let N — oo, we obtain, more or 
less formally, 

F(t) = ^ J^(JF)(5)e l5t d5, 
which translates to the marvelous formula 
(??F)(t) = 2TTF(-t). 

Like the Fourier sum formula, this Fourier inversion 
formula can be proved under a wide range of circum- 
stances, though often at the price of reinterpreting the 
formula in novel ways. 

Beautiful though the Fourier inversion formula is, it 
should be noted that, both in practice and in theory, 
we often need only the observation that JF = JG 
implies F = G. The uniqueness of the Fourier trans- 
form is often easier to prove and more convenient to 
use, and it holds over a wider range of conditions than 
the inversion formula. A similar observation holds for 
other transforms. 

When we talked about the Fourier sums associated 
with 2Tr-periodic functions, we said that f(r) mea- 
sured the proportion of the signal / with frequency 
2rrr. In the same way, (jFF)(A) gives a measure of the 
proportion of F composed of frequencies close to A. 
There is a family of inequalities, known generically as 
Heisenberg uncertainty principles, which say, in effect, 
that if most of fF is concentrated in a narrow band, 
then the signal F must be very spread out. This faet 
places strong restrictions on our ability to manipulate 
signals and occupies a central place in quantum theory. 

At the beginning of this article we talked about trans- 
formations of sequences and saw that it was easier to 
handle one-sided sequences than two-sided sequences. 
In the same way, we can apply Fourier transforms to 
a wider range of functions F : R — C if we know that 
F(t) = 0 for t < 0. More specifically, if F is such a one- 
sided funetion, and if it does not grow too fast, then we 
can compute the Laplace transform 

(£F)(x + i y) = j F(s)e _(x+1> ' )s ds 

= f F(s)e~ (x+i y )s ds 

Jo 

whenever x and y are real and x is sufficiently large. If 
we use the more natural notation 

UFHz) = £ F(s)e _zs ds, 

we see that LF can be considered as a weighted aver- 
age of holomorphic [1.3 §5.6] (that is, complex dif- 
ferentiable) functions and this can be used to show 
that LF is holomorphic. The Laplace transform shares 


many of the properties of the Fourier transform and 
we can use these, as well as the extensive collection of 
results on holomorphic functions, whenever we manip- 
ulate Laplace transforms. Many of the deepest results 
in number theory, such as the prime number theorem 
[V.33], are most easily obtained by elever uses of the 
Laplace transform. 

The transforms we have discussed all belong to the 
same family, as is indicated by the faet that they all 
take convolution to multiplication. The general idea of 
a transform has been developed in several different 
directions, generally by concentrating on some aspects 
of the “classical transforms” and being prepared to lose 
others. 

One of the most important of these new transforms is 
the Gelfand transform, which gives a concrete represen- 
tation of the abstract commutative Banach algebras. It 
is discussed in operator algebras [IV.19 §3.1]. Other 
integral transforms extend the integral definition of the 
Fourier transform by setting up a correspondence 

F(t) - — > J F(s)K(A-s)ds 
or, more generally, 

f(t) “ J-» F(s)K(s,A)ds. 

Another interesting transform is the Radon or x- 
ray transform. We shall consider the three-dimensional 
case and talk very informally. Suppose we shine a beam 
of radiation through a body in direction u. Suppose 
also that / is a funetion defined on R 3 that represents 
how mueh radiation is absorbed by different parts of 
the body. What we can measure is the amount of radi- 
ation absorbed along any given straight line. We might 
present some of this information in the form of a two- 
dimensional image, which could represent the amount 
absorbed by all lines in the direction u. In general, we 
can use / to dehne a new funetion 

(3 lf)(u,v) = ^f(tu + v)dt, 

which tells us how mueh radiation is absorbed along 
the line in direction u that goes through a vector v per- 
pendicular to u. The tomography problem deals with 
the recovery of / from 31/. 

Because the idea of a transform has been devel- 
oped in so many different directions, any attempt to 
give a general definition results in something too gen- 
eral to be useful. The most that we can say about 
the various transforms is that they present a more 
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or less distant analogy to the classical Fourier trans- 
forms and that this analogy has been found useful 
by those who developed them. (See also the fourier 
TRANSFORM [III.27], SPHERICAL HARMONICS [III.89], REP- 
RESENTATION THEORY [IV.12 §3], and WAVELETS AND 
APPLICATIONS [VII.3].) 


III.94 Trigonometric Functions 

Ben Green 


The basic trigonometric functions “sin” and “cos,” as 
well as the four related functions “tan,” “cot,” “sec,” 
and “cosec,” will probably be familiar to most readers 
in some form. One way to define the sine function sin : 
R -*■ [-1, 1] is as follows. 

In almost all branches of mathematics one measures 
angles using radians, which are defined in terms of arc- 
length: to say that the angle ZAOB in figure 1 is 9 radi- 
ans is to say that the arc AB of the circle has length d. 
This definition makes sense when 0 ^ 9 < 2tt. One 
then defines sin 9 to be the length PB, where P is the 
foot of the perpendicular from B to OA. It is very impor- 
tant that this length be taken with the correct sign. If 
0 < 9 < tt then we take the positive sign, whereas if 
rr < 9 < 2tt we take the negative sign. In other words, 
sind is the y-coordinate of the point B. 

The sine function is, at the moment, defined on the 
interval [0, 2rr). To define it on all of R one simply 
insists that it be periodic with period 2 tt (that is, that 
it satisfies the relation sind = sin(2Trn + 9) for any 
integer n). 

There is one problem with our definition of sine. 
What do we mean hy the length of the arc AB? The 
only really satisfactory way of understanding this is 
to use calculus. The equation of the unit circle is 
y = v'l - x 2 , at least if (x,y) lies in the upper-right 
quadrant. (Otherwise one must be careful about sign.) 
The formula for the arc-length of a curve y = f(x) 
between y = a and y = b is 

5 = J yjl + (dx/dy) 2 dy. 

(This may be thought of as a definition, though the 
motivationjpajThe definition co mes from pictures.) For 
the circle, i^l + ( dy/dx ) 2 = 1 j y jl - y 2 . Since the arc- 
length of the circle between the points P = (x, sind) 
and A = (1, 0) is d, this gives the formula 




Figure 1 Interpreting trigonometric functions 
geometrically. 


for 0 < d < tt/2 (we do not care about what x is). This 
can be regarded as giving a precise, even if implicit, 
definition of sind for 0 < d < tt/2. 

As with many of the most natural concepts in mathe- 
matics, sin may be defined in a multitude of equivalent 
ways. Another definition (whose equivalence to the first 
one is not obvious) is 


z 3 z 5 z 7 



This infinite series converges for all real z. The result- 
ing definition has a distinet advantage over (1), in that 
it also makes sense when z is an arbitrary complex 
number (that is why we replaced the letter d by z). It 
therefore allows us to extend sin to a holomorphic 
function [1.3 §5.6] on C. 

If the sine function is analytic, then what is its deriva- 
tive? The answer is the cosine function cos z, which may 
be defined in mueh the same way as sin: either geo- 
metrically or using a power series. The power series 



which may be obtained by differentiating the series for 
sin term by term (naturally, this is an operation that 
must be properly justified, but it can he). 

If one differentiates again, one gets the formula 
(d 2 /dz 2 ) sinz = - sinz. In faet, it is possihle to define 
sin : R — [-1,1] as the unique solution y to the differ- 
ential equation y" = -y that also satisfies the initial 
value conditions y(Q) = 0, y'( 0) = 1. This is a very 
sensible way of proving that the two definitions (1) and 
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(2) are equivalent (it is a good calculus exercise to prove 
that sin" = - sin using (1)). 

Ultimately, the power series expansions (2) and (3) 
display the most important side of sin and cos, which is 
their relation with the exponential function [III.25]: 



Comparing this with (2) and (3), one gets the famous 
formula 

e 10 = cos 0 + i sin 0. 

The exponential functions 0 <-> e in0 are characters, that 
is, homomorphisms [1.3 §4.1] from R/27TZ to the unit 
circle S 1 (which form groups under addition mod 2tt 
and multiplication, respectively ). This makes them the 
natural objects with which to do a fourier analysis 
[III. 2 7] of 2TT-periodic functions on R. Because sin and 
cos are real-valued, it is convenient to try to decompose 
such a function fix) not into exponentials, but as a 
series 


ao + ai cosx + bi sinx + a2 cos 2x + &2 sin2x + ■ ■ ■ . 
Under favorable circumstances (if the function / is suf- 
ficiently smooth, say) one can recover the coefficients 
au bi by using orthogonality relations such as 



Jo for all n, m ^ 0, 
” [l n = m, 

and 

1 f 27T 

— cosnxsinmxdx = 0 for all n, 
Thus, for example, we have 


0. 


1 f 2n 

a n = — /(x) cosnxdx. 

tt Jo 

Such decompositions into trigonometric functions ulti- 
mately underlie devices like compact disk players and 
mobile phones. 

Let us conclude by remarking that there is a whole 
zoo of formulas concerning sin, cos, and the other four 
trigonometric functions (which we have not discussed 
here), as well as integrals involving these functions. It is 
these formulas that make the trigonometric functions 
an indispensable tool in classical Euclidean geometry. 
There are many further formulas in that setting. To 
mention just one beautiful example, the area of a tri- 
angle inscribed in a unit circle with angles A, B, and C 
is exactly 2 sin A sin B sin C. 


Uncountable Sets 

See COUNTABLE AND UNCOUNTABLE SETS 
[III.11] 
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Let X be a topological space [III.92]. A loop in X 
can be defined as a continuous function / from the 
closed interval [0,1] to X such that /(O) = fil). A 
continuous family of loops is a continuous function F 
from [0, l] 2 to X such that F(t, 0) = F(t,l) for every 
f; the idea is that for each t we can define a loop ft 
by taking ft (s) to be F(t,s), and if we do this then the 
loops ft “vary continuously” with t. A loop / is con- 
tractible if it can be continuously shrunk to a point: 
more formally, there should be a continuous family of 
loops F(t, s) withF(0,s) = f{s) for every 5 and with all 
values of F( 1 , 5) equal. If all loops are contractible, then 
X is said to be simply connected. For instance, a sphere 
is simply connected, but a torus is not because there 
are loops that “go around” the torus and therefore can- 
not be contracted (since any continuous deformation 
of a loop that goes around the torus goes around the 
same number of times). 

Given any path-connected space (that is, a space X 
such that any two points in X are linked by a continu- 
ous path), we can define a closely related simply con- 
nected space X as follows. First, we pick an arbitrary 
“base point” xo in X. We then take the set of all con- 
tinuous paths / from [0, 1] to X such that /(0) = Xo 
(but we do not necessarily ask for / (l) to be Xo). Next, 
we regard two of these paths / and g as equivalent, 
or homotopic, if /(l) = gi 1) and there is a continu- 
ous family of paths that begins with / and ends with g 
and always has the same beginning point and endpoint. 
That is, / and g are homotopic if there is a continuous 
function F from [0, l] 2 to X such that F(t, 0) = Xo and 
F(t, 1) = fil) = gi 1) for every t, and F(0,s) = fis) 
and F(l,s) = gis) for every 5. Finally, we define the 
universal cover X of X to be the space of all homotopy 
classes of paths: that is, it is the quotient [1.3 §3.3] of 
the space of all continuous paths that start at xo by the 
EQUIVALENCE RELATION [1.2 §2.3] Of homotopy. 

Let us see how this works in practice. As mentioned 
earlier, the torus is not simply connected, so what is 
its universal cover? To answer this question, it helps to 
think of the torus in a slightly artificial way: fix a point 
xq and define the torus to be the set of all continuous 
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paths that begin at xo, with two of these paths regarded 
as equivalent if they have the same endpoint. If we do 
this, then for each path, “all we care about” is where 
it ends, and the set of endpoints is clearly the torus 
itself. But this was not the definition of the universal 
cover. There we cared not just about the endpoint of 
a path but also about how we reached the endpoint. 
For instance, if the path happens to be a loop, in which 
case the endpoint is xo itself, then we care about how 
many times that loop goes around the torus, and in 
what manner it goes around. 

The torus can be defined as the quotient of IR 2 by 
the equivalence relation where we define two points 
as equivalent if their difference belongs to z 2 . Then 
any point in R 2 maps to a point in the torus (by the 
quotient map). Any continuous path on the torus then 
“lifts” uniquely to the plane in the following sense. Fix 
a point Uo in R 2 that maps to Xq in the torus. Then 
if you trace out any continuous path in the torus that 
starts at xo, there will be exactly one way of tracing out 
a path in R 2 such that each point in that path maps to 
the appropriate point in the path in the torus. 

Now suppose that we have two paths in the torus 
that start at xo and end at the same point xi. Then 
the “lifts” of those paths both start at uo but all we 
know about their endpoints is that they are equivalent: 
we do not know that they are the same. Indeed, if the 
first path is a contractible loop and the second is a loop 
that goes once around the torus, then their lifts will 
end at different points. It turns out (and if you try to 
visualize this then you will see that the result is very 
natural and plausible) that the “lifts” of two paths will 
end at the same point if and only if the original paths 
are homotopic. In other words, there is a one-to-one 
correspondence between homotopy classes of paths in 
the torus and points in R 2 . This shows that R 2 is the 
universal cover of the torus. In a sense, the operation 
of passing from a space to its universal cover “unfolds” 
the quotienting operation that we use to get from the 
universal cover to the space. 

As its name suggests, the universal cover has a uni- 
versal property. Roughly speaking, a cover of a space 
X is a space Y and a continuous surjection from Y to 
X such that the inverse image of a small neighborhood 
in X is a disjoint union of small neighborhoods in Y. If 
U is the universal cover of X and Y is any other cover 
of X, then U can be made into a cover of Y in a natural 
way. For instance, one can define a cover of the torus 
by an infmite cylinder by wrapping the cylinder around, 
and the cylinder can in turn be covered by the plane. 


An example of the use of universal covers can 
be found in geometric and combinatorial group 
THEORY [IV. 11 §§7,8]. 


III.96 Variational Methods 

Lawrence C. Evans 


The calculus o f variations is both a theory in itself and 
a toolbox of techniques for studying certain kinds of 
(often extremely nonlinear) ordinary and partial differ- 
ential equations. These equations, which arise when we 
seek critical points of appropriate “energy” function- 
als, are usually far more tractable than other nonlinear 
problems. 

1 Critical Points 

Let us begin with a simple observation from first- 
year calculus, where we learn that if / = f(t) is a 
smooth function defined on the real line R and if / 
has a local minimum (or maximum) at a point to, then 
(d//dt)(t 0 ) = 0. 

The calculus of variations vastly extends this insight. 
The basic object to be considered is a functional F, 
which is applied not to real numbers but to functions, 
or rather to certain admissible classes of functions. 
That is, F takes functions u to real numbers F(u). 
If Uo is a minimizer of F (that is, F(u o) ^ F(u) for 
all admissible functions u), then we can expect that 
“the derivative of F at uo is zero.” Of course, this idea 
has to be made precise, which one might expect to be 
tricky since the space of admissible functions is infinite 
dimensional. But in practice these so-called variational 
methods end up using just standard calculus, and they 
provide deep insights into the nature of minimizing 
functions uq. 

2 One-Dimensional Problems 

The simplest situation to which variational techniques 
apply involves functions of a single variable. Let us see 
why rninimizers of appropriate functionals in this set- 
ting must automatically satisfy certain ordinary differ- 
ential equations. 

2.1 Shortest Distance 

As a warmup problem, we shall show that the shortest 
path between two points in the plane is a line segment. 
Of course, this is obvious, but the methods we develop 
can be applied to much more interesting situations. 
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Suppose, then, that we are given two points x and y 
in the plane. We take as our class of admissible func- 
tions all smooth, real-valued functions u, defined on 
some interval I = [a, b], such that u(a) = x and 
u(b) = y. The length of this curve is 

F[u] = J^(l + (u') 2 ) 1/2 dx, (1) 

where u = u(x) and a prime denotes differentiation 
with respect to x. Now suppose that some particular 
curve Uo minimizes the length. We want to deduce that 
the graph of Uo is a line segment, which we will do by 
“setting the derivative of F to zero” at the minimizer 
Uo- 

To make sense of this idea, select any other smooth 
function ta that is defined on our interval I and that 
vanishes at its endpoints. For each t define /(t) to be 
F[uq + tw]. Since the graph of the function Mo + tw 
connects the given endpoints, and since uo gives the 
minimum length, it follows that the function /, which 
is just an ordinary function from R to R, has a minimum 
at t = 0. Therefore, (d//dt)(0) = 0. But we can explic- 
itly compute (d//dt)(0) by differentiating under the 
integral sign and then integrating by parts. This gives 

l (I + (U' 0 V)V2 dX = - l (d + ( M ?)2)l/2 )' W dX - 
This identity holds for all functions w with the proper- 
ties specified above, and consequently 

“ ( “g < = o (2) 

lu + («o) 2 ) 1/2 ' > V2 

everywhere on the interval I. 

To summarize the discussion so far: if the graph 
of Uo minimizes the distance between the given end- 
points, then u'o identically equals zero, and therefore 
the shortest path is a line segment. This conclusion 
may not seem too exciting, but even this simple case 
has an interesting feature. The calculus of variations 
automatically focuses our attention on the expression 

which turns out to be the curvature of the graph of 
u. The graph of the minimizer uo has zero curvature 
everywhere. 

2.2 Generalization: The Euler-Lagrange Equations 

It turns out that the technique we used for the previ- 
ous example is extremely powerful and can be vastly 
generalized. 


One useful extension is to replace the length func- 
tional (1) by a more general functional of the form 

F[U] = Jl(tt',M,x)dx, (3) 

where L = L(v,z,x) is a given function, sometimes 
called the Lagrangian. Then F[u ] can be interpreted as 
the “energy” (or “action”) of a given function u defined 
on the interval I. 

Suppose next that a particular curve Uo is a mini- 
mizer of F, subject to certain fixed boundary condi- 
tions. We want to extract information about the behav- 
ior of uo, and to do so we proceed as in our first exam- 
ple. We select a smooth function w as above, define 
f(t) = F[uo+tw], observe that / has a minimum at t = 
0, and consequently deduce that (d//dt)(0) = 0. As 
in the previous calculation, we then explicitly compute 
this derivative: 

f (0) = ^L v w'+L z w dx = +L z )w dx. 

Here, L v and L z stand for the partial derivatives dL/dv 
and dL/dz, evaluated at (u' 0 ,uo,x). This expression 
equals zero for all functions w satisfying the given 
conditions. Therefore, 

-(L v (u' 0 ,Uo,X))' +L z (u' 0 ,Uo,X) = 0 (4) 

everywhere on the interval I. This nonlinear ordinary 
differential equation for the function uo is called the 
Euler-Lagrange equation. The key point is that any min- 
imizer of our functional F must be a solution of this 
differential equation, which often contains important 
geometrical or physical information. 

For example, take L(v,z,x) = \mv 2 - W(z), which 
we interpret as the difference between the kinetic 
energy and the potential energy W of a particle of 
mass m moving along the real line. The Euler-Lagrange 
equation (4) is then 

mu’o = -W'(uo), 

which is Newton’ s second law of motion. The calculus 
of variations provides us with an elegant derivation of 
this fundamental law of physics. 

2.3 Systems 

We can generalize further, by taking 

F[«] = J !(«',«, x)dx, (5) 

where now we are taking vector-valued functions u that 
map the interval I into R m . If mq is a minimizer in 
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some appropriate class of functions, then one can com- 
pute the Euler-Lagrange equation using ideas similar to 
those discussed above. We obtain the equations 

— (L v k (Uq , Uo , x))' + L z k (Uq , Uo , x ) = 0, (6) 

one for each k. Here L v k and L z k represent the partial 
derivatives of L with respect to the fcth variables of u' 
and u, evaluated at (u’ 0 ,ug,x). These equations form 
a system of coupled ordinary differential equations for 
the components of uo = (uj, . . . , u™). 

For a geometric example, put 

L(v,z,x ) = ( § g ij (z)v i vj) 1/2 , 

i,j = 1 

so that F[u] is the length of the curve u in the rieman- 
nian metric [1.3 §6.10] determined by the øy. When ug 
is a curve of constant unit speed, the Euler-Lagrange 
system of equations (6) can be rewritten, after some 
work, to read 

(ug)" + X 7^(x4)'(u J 0 )' = 0 (k = 1 m) 

for certain expressions Ly, called Christoffel symbols, 
that can be computed in terms of the gij. Solutions 
of this system of ordinary differential equations are 
called geodesics. Thus, we have deduced that length- 
minimizing curves are geodesics. 

A physical example is L(v,z,x) = \m\v\ 2 - W(z), 
for which the Euler-Lagrange equation is 
mu' 0 ' = -VW(uo). 

This is Newton’s second law of motion for a parti- 
cle in R m moving under the influence of the potential 
energy W. 

3 Higher-Dimensional Problems 

Variational methods also apply to expressions involv- 
ing functions of several variables, in which case the 
resulting Euler-Lagrange equations are partial differ- 
ential equations (PDEs). 

3.1 LeastArea 

A first example extends our earlier examination of 
shortest curves. For this problem we are given a region 
U in the plane, with boundary dU, and a real-valued 
function g defined on the boundary. We then look at 
a class of admissible real-valued functions u, defined 
on U, with the condition that u should equal g on the 


boundary. We can think of the graph of u as a two- 
dimensional curved surface with a boundary equal to 
the graph of g. The area of this surface is 

F[u]= f (l + |Vu| 2 ) 1/2 d%. (7) 

Let us assume that a particular function mo minimizes 
the area among all other surfaces with the given bound- 
ary. What can we deduce about the geometric behavior 
of this so-called minimal surface ? 

Yet again we proceed by writing f(t) = F[uo + tw ], 
differentiating with respect to t, and so on. After some 
calculation we eventually discover that 

^((1+ | VWoP) 1 /2 ) =0 (8) 

within the region U, where “div” denotes the divergence 
operator. This nonlinear PDE is the minimal surface 
equation. The left-hand side turns out to be a formula 
for (twice) the mean curvature of the graph of u<>. Con- 
sequently, we have shown that a minimal surface has 
zero mean curvature everywhere. 

Minimal surfaces are sometimes regarded physically 
as the surfaces formed by soap films when they are 
stretched between a fixed wire frame that traces out 
the boundary specified by the function g. 

3.2 Generalization: The Euler-Lagrange Equations 

It is now straightforward, and sometimes very prof- 
itable, to replace the area functional (7) by the general 
expression 

F[u]= { L(Vu,u,x) dx, (9) 

Ju 

in which we now take U to be a region in R n . Assum- 
ing that ug is a minimizer, subject to given boundary 
conditions, we deduce the Euler-Lagrange equation 
- div(Vt,I(Vuo,Uo,%)) + L z (Vug,Uo,x) = 0. (10) 

This is a nonlinear PDE that a minimizer must satisfy. 
A given PDE is called variational if it has this form. 

If, for example, we take L(v,z,x) = ^\v\ 2 + G(z), 
the corresponding Euler-Lagrange equation is the non- 
linear Poisson equation 

Au = g(u), 

where g = G’ and Au = X”=i ti Xk x k is the lapla- 
cian [1.3 §5.4] of u. We have shown that this impor- 
tant PDE is variational. This is a valuable insight, since 
we can then find solutions by constructing minimiz- 
ers (or other critical points) of the functional F[u] = 
lu \\S7u\ 2 + G(u)dx. 
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4 Further Issues in the Calculus of Variations 

Our examples have shown pretty convincingly how 
useful our simple method, called computing the first 
variation, can be when applied to the right geomet- 
rical and physical problems. And indeed, variational 
principles and methods appear in several branches 
of mathematics and physics. Many of the objects 
that mathematicians consider most important have 
an underlying variational principle of some kind. The 
list is impressive and, besides the examples we have 
discussed, includes Hamilton's equations, the Yang- 
Mills and Selberg-Witten equations, various nonlinear 
wave equations, Gibbs States in statistical physics, and 
dynamic programming equations from optimal Control 
theory. 

Many issues remain. For example, if f = f(t) has a 
local minimum at a point to, then we know not only 
that (d//dt)(to) = 0, but also that (d 2 //dt 2 )(to) ^ 0. 
The attentive reader will correctly guess that a gener- 
alization of this observation, called computing the sec- 
ond variation, is important for the calculus of varia- 
tions. It provides an insight into appropriate convex- 
ity conditions that are needed to ensure that critical 
points are in faet stable minimizers. Even more fun- 
damental is the question of the existence of minimiz- 
ers or other critical points. Here mathematicians have 
devoted great ingenuity to designing appropriate func- 
tion spaces within which “generalized” solutions can be 
found. But these weak solutions need not be smooth, 
and so the further question of their regularity and/or 
possible singularities must then be addressed. 

However, these are all highly technical mathemat- 
ical issues, far beyond the scope of this article. We 
end our discussion here, in the hope that our exces- 
sive demands upon the reader’ s attention have been 
minimized. 
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Two simple examples of varieties are the circle and 
the parabola, which can be defined by the polynomial 
equations x 2 + y 2 = 1 and y = x 2 , respectively. 
With one qualification, a variety is the solution set of 
a system of polynomial equations. The qualification is 
that there are certain examples that we do not want to 
include. For instance, the set of solutions to the equa- 
tion x 2 - y 2 = 0 is the union of the two lines x = y 
and x = -y, which naturally splits into two pieces. So 
the solution set to a system of polynomial equations 


is called an algebraic set, and it is called a variety if it 
cannot be written as a union of smaller algebraic sets. 

The examples just given were subsets of the plane 
R 2 . However, the concept is mueh more general: vari- 
eties can live in R” for any n, and also in C n for any n. 
Indeed, the definitions make sense, and are interesting 
and important, in F n , where F can be any field. 

The varieties defined so far have been affine vari- 
eties. For many purposes it is more convenient to 
deal with projective varieties. The definition is similar, 
but now they live inside a projective space [III.74], 
and the polynomials used to define them must be 
homogeneous— that is, any multiple of a solution must 
still be a solution. 

See ALGEBRAIC GEOMETRY [IV. 7] and ARITHMETIC 
geometry [IV.6] for more information. 


III.98 Vector Bundles 


Let X be a topological space [III.92]. A vector bun- 
dle over X is, roughly speaking, a way of associating a 
vector space with each point x of X in such a way that 
these spaces “vary continuously” as you vary x. As an 
example, consider a smooth surface X inR 3 . Associated 
with each point x is the tangent plane at x, which varies 
continuously with x and can be identified in a natural 
way with a two-dimensional vector space. A more pre- 
cise definition is as follows: a vector bundle of rank n 
over X is a topological space E, together with a con- 
tinuous map p : E — X, such that the inverse image 
p ] (x) of each point x (that is, the set of points in E 
that map to x) is an n-dimensional vector space. The 
most obvious vector bundle of rank n over X is the 
space R n x X with the map p(v,x) = x; this is called 
the trivial bundle. However, the interesting bundles are 
the nontrivial ones, such as the tangent bundle of the 
2-sphere. One can learn a great deal about a topological 
space by understanding its vector bundles. For this rea- 
son, vector bundles are central to algebraic topology. 
See algebraic topology [IV. 10 §5] for more details. 


III.99 Von Neumann Algebras 


A unitary representation of a group [1.3 §2.1] G is a 
homomorphism [1.3 §4.1] that associates with each ele- 
ment g of G a unitary map [III. 5 2 §3.1] U g defined on 
some hilbert space [III.37] H. A von Neumann alge- 
bra is a special kind of C* -algebra [III.12], intimately 
connected with the theory of unitary representations. 



There are several equivalent ways of deftning von Neu- 
mann algebras. One is as follows. It can be checked 
that, given any unitary representation, its commutant, 
defined to be the set of all operators [III.52] in B(H) 
that commute with every single unitary map U g in the 
representation, forms a C* -algebra. Von Neumann alge- 
bras are algebras that arise in this way. They can also 
be defined abstractly as follows: a C* -algebra A is a 
von Neumann algebra if there is a banach space [III.64] 
X such that the dual [III. 19 §4] of X is A (when A is 
itself considered as a Banach space). 

The basic budding blocks of von Neumann algebras 
are special kinds of von Neumann algebras called fac- 
tors. The classification of factors is a major topic of 
research, which includes some of the most celebrated 
theorems of the second half of the twentieth century. 
See operator algebras [IV.19 §2] for more detads. 
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If you wish to send a black and white picture from one 
computer to another, then an obvious way of doing it is 
to encode it pixel by pixel: 0 for black and 1 for white. 
However, for certain pictures this would obviously be 
extremely inefficient. For distance, if the picture is a 
square, the left half of which is entirely white and the 
right half of which is entirely black, then it is clearly 
much better to send a set of instructions for recon- 
structing the picture than a list of every single pixel. 
Furthermore, the precise detads of the pixels usually do 
not matter: if you want a patch of gray, then it is enough 
to put in black and white pixels in the right proportion 
and make sure that they are evenly distributed. 

However, finding a good way of encoding pictures is 
difficult, and an important area of research in engineer- 
ing. A picture can be thought of as a function from 
a rectangle to R. The set of all such functions forms 
a vector space [1.3 §2.3], and a natural way to try to 
come up with a good encoding is to find a good basis 
for this space. Here “good” means that the functions 
one is interested in (that is, ones that correspond to 
the kinds of pictorial representations one might wish to 
send) are determined by just a few of their coefficients, 
apart from minor variations that are not detectable by 
the human eye. 

Wavelets are a particularly good basis for many pur- 
poses. In some ways they are dke fourier transforms 
[III. 2 7], but they are much better suited to encoding 
detads such as sharp boundaries, and patterns that are 
“localized,” rather than spread throughout the picture. 


For more detads, see wavelets and Applications 
[VU.3]. 


III. 101 The Zermelo-Fraenkel Axioms 


The Zermelo-Fraenkel, or ZF, axioms are a codection of 
axioms that provide a foundation for set theory. They 
may be viewed in two ways. The first is as a list of the 
“allowed operations” on sets. For example, there is an 
axiom that States that, given sets x and y, there exists 
a “pair set,” whose members are precisely x and y. 

One of the reasons the ZF axioms are important is 
that it is possible to reduce all of mathematics to set 
theory, so the ZF axioms can be regarded as a founda- 
tion for mathematics as a whole. Of course, for this to 
be the case it is vital that the operations allowed by the 
ZF axioms do indeed allow one to perform ad of the 
usual mathematical constructions. Some of the axioms 
are rather subtle as a result. 

The other way to view the ZF axioms is as giving us 
just what we need to “budd up” the world of ad sets, 
starting with just the empty set. One can look at the 
various ZF axioms and see that each one plays an essen- 
tial role as we create the set-theoretic universe. Equiva- 
lently, they are “closure rules” that any universe of sets, 
or more precisely any model of set theory, should obey. 
So, for example, there is an axiom that says that every 
set has a power set (the set of all its subsets), and this 
axiom allows us to budd up a huge collection of sets 
starting with just the empty set: one obtains the power 
set of the empty set, the power set of the power set 
of the empty set, and so on. Indeed, the universe of all 
sets could (in a sense) be described as the closure of 
the empty set under ad the adowable operations of ZF. 

The ZF axioms are written in the language of first- 
order logic [IV.2 §1]. So each axiom can mention vari- 
ables (which are interpreted as ranging over all sets), 
as wed as the usual logical operations, and also one 
“primitive relation,” namely membership. For example, 
the pair-set axiom above would be formally written as 
(Vx)(V;y)(3z)(Vt)(t e z <^=> t = x ot t = y). 

By convention, the ZF axioms do not include the 
axiom of choice [III.l]; when one does includes the 
axiom of choice, the axioms are usually called the “ZFC 
axioms.” 

For a more detaded discussion of the ZF axioms see 
SET THEORY [IV. 1 §3.1]. 



Part IV 

Branches of Mathematics 


IV. 1 Algebraic Numbers 

Barry Mazur 

The roots of our subject go back to ancient Greece while 
its branches touch almost all aspects of contemporary 
mathematics. In 1801 the Disquisitiones Arithmeticae 
o£,CARt FRIEDRICH, GAUS^t^I;^|%'as first published, a 
“founding treatise,” if ever there was one, for the mod- 
ern attitude toward number theory. Many of the still 
unachieved aims of current research can be seen, at 
least in embryonic form, as arising from Gauss’s work. 

This article is meant to serve as a companion to the 
reader who might be interested in learning, and think- 
ing about, some of the classical theory of algebraic 
numbers. Much can be understood, and much of the 
beauty of algebraic numbers can be appreciated, with a 
minimum of theoretical background. I recommend that 
readers who wish to begin this journey carry in their 
backpacks Gauss’s Disquisitiones Arithmeticae as well 
as Davenport’s The Higher Arithmetic (1992), which is 
one of the gems of exposition of the subject, and which 
explains the founding ideas clearly and in depth using 
hardly anything more than high-school mathematics. 

1 The Square Root of 2 

The study of algebraic numbers and algebraic integers 
begins with, and constantly reverts back to, the study of 
ordinary rational numbers and ordinary integers. The 
first algebraic irrationalities occurred not so much as 
numbers but rather as obstructions to simple answers 
to questions in geometry. 

That the ratio of the diagonal of a square to the length 
of its side cannot be expressed as a ratio of whole num- 
bers is purported to be one of the vexing discoveries 
of the early Pythagoreans. But this very ratio, when 
squared, is 2:1. So we might— and later mathematicians 
certainly did— deal with it algebraically. We can think 
of this ratio as a cipher, about which we know nothing 


beyond the faet that its square is 2 (a viewpoint taken 
toward algebraic numbers by kronecker [VI.48], as we 
shall see below). We can write \/2 in various forms, e.g., 
v/2 = |1 -i|, (1) 

and we can think of 1 - i = 1 - e 2m/4 as the world’s sim- 
plest trigonometric sum; we shall see generalizations of 
this for all quadratic surds below. We can also view \/2 
as a limit of various infmite sequences, one of which is 
given by the elegant continued fraction [III.22] 

V2 = 1 + 1 - i — . (2) 

2 +2T^ 

Directly connected to this continued fraction (2) is the 
Diophantine equation 

2X 2 - Y 2 = ±1 (3) 

known as the Pell equation. There are infinitely many 
pairs of integers (x,y) satisfying this equation, and 
the corresponding fractions y/x are precisely what you 
get by truncating the expression in (2). For example, the 
first few solutions are (1, 1), (2, 3), (5, 7), and (12, 17), 



Replace the ±1 on the right-hand side of (3) by zero 
and you get 2X 2 - Y 2 = 0, an equation all of whose 
positive real-number solutions (X, Y) have the ratio 
Y/X = V2, so it is easy to see that the sequence of 
fractions (4) (these being alternately larger and smaller 
than y/2 = 1.414 ... ) converges to -/2 in the limit. Even 
more striking is that (4) is a list of fractions that hest 
approximate -/2. (A rational number al d is said to 
be a best approximant to a real number « if a/d is 
doser to a than any rational number of denomina- 
tor smaller than or equal to d.) To deepen the pie- 
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Figure 1 The outer rectangle has its height-to-width ratio 
equal to the golden mean. If you remove a square from it as 
indicated in the figure, you are left with a rectangle that has 
the golden mean as its width-to-height ratio. This procedure 
is of course repeatable. 


ture, consider another important infinite expression, 
the conditionally convergent series 


log(V2 + l) 

Tii^" 


■ ■ ■ ■ (5) 


Here the n range over positive odd numbers, and the 
sign of the term ±l/n is plus if n has a remainder of 
i 1 or 7 when divided hy 8, and it is minus if n has a 
remainder of 3 or 5. This elegant formula (5), whichyou 
are invited to “check out” at least to one digit accuracy 
with a calculator, is an instance of the powerful and 
general theory of analytic formulas for special values 
of I-functions [111.49], which plays the role of a bridge 
between the more algebraic and the more analytic sides 
of the story. When we allude to this, below, we will call 
it “the analytic formula,” for short. 


2 The Golden Mean 

If you are looking for quadratic irrationalities that have 
been the suhject of geometric fascination through the 
ages, then 72 has a strong competitor in the num- 
ber j(l + 75), known as the golden mean. The ratio 
Jj ( I + V5):l gives the proportions of a rectangle with 
the property that when you remove a square from it, as 
in figure 1, you are left with a smaller rectangle whose 


sides are in the same proportion. Its corresponding 
trigonometric sum description is 

|(1+V5) = i + cos§Tr-cos|7T. (6) 
Its continued-fraction expansion is 

g(l + V5) = 1+ i | 1 , , (7) 


where the sequence of fractions obtained by successive 
tnmcations of this continued fraction, 


is a sequence of hest rational-number approximants to 
i(l + V5) = 1.618033988749894848..., 
where “hest” has the sense already mentioned. For 
example, the fraction 



equals 1.619047619047619047 . . . and is doser to the 
golden mean than any fraction with denominator less 
than 21. 

Nevertheless, the exclusive appearance of ls in this 
continued fraction 1 can be used to show that, among 
all irrational real numbers, the golden mean is the 
number that is, in a specific technical sense, least well 
approximated by rational numbers. 

Readers familiar with the sequence of Fibonacci num- 
bers will recognize them in the successive denomina- 
tors of (8), and in the numerators as well. The analogue 
to equation (3) is n® bl s thanks to 

proofreader for 

X 2 +XY-Y 2 =±l. (9) equaS, 111 ™'' 

This time, if youreplace the ±1 on the right-hand side SteLttck™™ 8 m 
of the equation by 0, you get the equation X 2 + XY - 
Y 2 = 0, whose positive real-number solutions (X, Y) 
have the ratio Y/X = ^(1 + 75), that is, the golden 
mean. And now the numerators and denominators y, x 
that appear in (8) run through the positive integral 
solutions of (9). The analogue of formula (5) (the “ana- 
lytic formula”) for the golden mean is the conditionally 
convergent infinite sum 


21og(|(l+75)) 

75 


(10) 


where the n range over positive integers not divisible 

by 5 , and the sign of ± 1 / n is plus if n has a remainder again, the 

' proofreader’s 

1. The continued-fraction expansion of any real quadratic algebraic notbeagood V 
number has an eventuaby recurring pattern in its entries, as is vividly addition before 
exhibited by the two examples (2) and (7) given above. ' s ls ' 
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of ±1 when divided by 5, and minus otherwise. 

What governs the choice of the plus terms and minus 
terms is whether or not n is a quadratic residue mod- 
ulo 5. Here is a brief explanation of this terminology. 
If m is an integer, two integers a, b are said to be con- 
gruent modulo m (in symbols we write a = b mod m) 
if the difference a - b is an integral multiple of m; if a, 
b, and m are positive numbers, it is equivalent to ask 
that a and b have the same “remainder” (sometimes 
also called “residue”) when each is divided by m (see 
modular arithmetic [III.60]). An integer a relatively 
prime to m is called a quadratic residue modulo m if a 
is congruent to the square of some integer, modulo m; 
otherwise it is called a quadratic nonresidue modulo m. 
So, 1, 4, 6, 9, . . . are quadratic residues modulo 5, while 
2, 3, 7, 8, . . . are quadratic nonresidues modulo 5. 

A generalization of equations (5) and (10) (the “ana- 
lytic formula for the I-function attached to quadratic 
Dirichlet characters”) gives a very surprising formula 
for the conditionally convergent sum of terms ±l/n, 
where n runs through positive integers relatively prime 
to a fixed integer and the sign of ±1 /n corresponds to 
whether n is a quadratic residue, or nonresidue modulo 
that integer. 

3 Quadratic Irrationalities 

The quadratic formula 



gives the solutions (usually two) to the general quad- 
ratic polynomial equation aX 2 +bX+c = 0 as a rational 
expression of the number -JD, where D = b 2 - 4 ac 
is known as the discriminant of the polynomial aX 2 + 
bX + c, or, equivalently, of the corresponding homoge- 
neous quadratic form [III.75] aX 2 + bXY + cY 2 . This 
formula introduces many irrational numbers: Plato’s 
dialogue “Theaetetus” has the young Theaetetus cred- 
ited with the discovery that y'D is irrational whenever 
D is a natural number that is not a perfect square. The 
curious switch, from initially perceiving an obstruction 
to a problem to eventually embodying this obstruction 
as a number or an algebraic object of some sort that we 
can effectively study, is repeated over and over again, 
in different contexts, throughout mathematics. Much 
later, complex quadratic irrationalities also made their 
appearance. Again these were not at first regarded as 
“numbers as such,” but rather as obstructions to the 
solution of problems. Nicholas Chuquet, for example, 


in his 1484 manuscript, Le Triparty, raised the ques- 
tion of whether or not there is a number whose triple 
is four plus its square and he comes to the conclu- 
sion that there is no such number because the quad- 
ratic formula applied to this problem yields “impossi- 
ble” numbers, i.e., complex quadratic irrationalities in 
our terminology. 2 

For any real quadratic (“integral”) irrationality there 
is a discussion along similar lines to the ones we 
have just given (expressions (l)-(5) for -J2 and expres- 
sions (6)— (10) for J, O + V5)). For complex irrational- 
ities, there is also such a theory, but with interest- 
ing twists. For one thing, we do not have anything 
directly comparable to continued-fraction expansions 
for a complex quadratic irrationality. In faet, the sim- 
ple, but true, answer to the problem of how to find an 
infinite number of rational numbers that converge to 
such an irrationality is that you cannot! Correspond- 
ingly, the analogue of the Pell equation has only finitely 
many solutions. As a consolation, however, the appro- 
priate “analytic formula” has a simpler sum, as we will 
see below. 

Let d be any square-free integer, positive or negative. 
Associated with d is a particularly important number 
T^, defined as follows. If d is congruent to 1 mod 4 (that 
is, if d - 1 is a multiple of 4), then t & = + Vd); 

otherwise, = -\/d. We will refer to these quadratic 
irrationalities as fundamental algebraic integers of 
degree 2. The general notion of an “algebraic integer” 
is defined in section 11. An algebraic integer of degree 
two is simply a root of a quadratic polynomial of the 
form X 2 + aX + b with a, b ordinary integers. In the 
first case (when d = 1 modulo 4), Td is a root of the 
polynomial X 2 - X + \ (1 - d) and in the second it is 
a root of X 2 - d. The reason special names are given 
to these quadratic irrationalities is that any quadratic 
algebraic integer is a linear combination (with ordi- 
nary integers as coefficients) of 1 and one of these 
fundamental quadratic algebraic integers. 

4 Rings and Fields 

I think that one of the big early advances in mathe- 
matics is the now-current, universal recognition of the 
importance of studying the properties of collections of 
mathematical objects, and not just the objects in iso- 
lation. A ring R of complex numbers is a collection of 


2. bombelli [VI.8], In the sixteenth century, would refer to irrational 
square roots, of positive or of negative numbers, as “deaf” (reminis- 
cent of the word surd that is still in use) and as “numbers impossible 
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Figure 2 The Gaussian integers are the vertices of 
this lattlce of squares tihng the complex plane. 


them that contains 1 and is closed under the opera- 
tions of addition, subtraction, and multiplication. That 
is, if a, b are any two numbers in R, a±b and ab must 
also be in R. If such a ring R has the further property 
that it is closed under division by nonzero elements 
(i.e., if a/b is again in R whenever a and b are, and 
b f 0), then we say that R is a field. (These concepts 
are discussed further in fields [1.3 §2.2] and rings, 
ideals, and modules [III.83].) The ring Z of ordinary 
integers, { 0 , ± 1 , ± 2 , . . . } is our “founding example” of a 
ring; visibly, it is the smallest ring of complex numbers. 

The collection of all real or complex numbers that are 
integral linear combinations of 1 and is closed under 
addition, subtraction, and multiplication, and is there- 
fore a ring, which we denote by Rd . That is, Rd is the 
set of all numbers of the form a + bTd where a and b 
are ordinary integers. These rings Rd are our first, basic, 
examples of rings ofalgebraic integers beyond that pro- 
totype, Z, and they are the most important rings that 
are receptacles for quadratic irrationalities. Every quad- 
ratic irrational algebraic integer is contained in exactly 
one R d . 

For example, when d = - 1 the corresponding ring 
R~ i , usually ref erred to as the ring of Gaussian integers, 
consists of the set of complex numbers whose real and 
imaginary parts are ordinary integers. These complex 
numbers may be visualized as the vertices of the infi- 
nite tiling of the complex plane by squares whose sides 
have length 1 (see figure 2 ). 

When d = -3 the complex numbers in the corre- 
sponding ring R -3 may be visualized as the vertices of 



Figure 3 The elements of the ring R - 3 are the vertices of 
this lattice of hexagons tiling the complex plane. 


the regular hexagonal tiling of the complex plane (see 
figure 3). 

With the rings Rd in hånd, we may ask ring-theoretic 
questions about them, and here is some of the stan- 
dard vocabulary useful for this. A unit u in a given ring 
R of complex numbers is a number in R whose recip- 
rocal l/u is also in Æ; a prime (or synonymously, an 
irreducible ) element in R is a nonunit that cannot be 
written as the product of two nonunits in R. A ring of 
complex numbers R has the unique factorization prop- 
erty if every nonzero, nonunit, algebraic number in R 
can be expressed as a product of prime elements in 
exactly one way (where two factorizations are counted 
as the same if one can be obtained from the other by 
rearranging the order in which the primes appear and 
multiplying them by units). 

In the prototype ring Z of ordinary integers, the only 
units are ±1. The fundamental faet that any ordinary 
integer greater than 1 can be uniquely expressed as 
a product of (positive) prime numbers (that is, that Z 
enjoys the unique factorization property) is crucial for 
mueh of the number theory done with ordinary inte- 
gers. That this unique factorization property for inte- 
gers actually required proof was itself a hard-won real- 
ization of Gauss, who also provided its proof (see the 
FUNDAMENTAL THEOREM OF ARITHMETIC [V.16]). 

It is easy to see that there are only four units in the 
ring of Gaussian integers, namely ±1 and ±i; mul- 
tiplication hy any of these units effeets a symmetry 
of the infinite square tiling (figure 2 above). There are 
only six units in the ring R- 3 , namely ±1, ± \ ( 1 + y'-3) 
and ± ^(1 - y - 3 ) ; multiplication hy any of these units 
results in a symmetry of the infmite hexagonal tiling 
(figure 3 above). 
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Fundamental to understanding the arithmetic of Rd 
is the following question: which ordinary prime num- 
bers p remain prime in Rd and which ones factorize 
into products of primes in Rd ? We will see shortly that 
if a prime number does factorize in Rd, it must be 
expressible as the product of precisely two prime fac- 
tors. For example, in the ring of Gaussian integers, R- 1, 
we have the factorizations 

2 = (1 + i)(l — i), 

5 = (1 + 2i) (1 — 2i), 

13 = (2 + 3i)(2 - 3i), 

17 = (1 +4i)(l -4i), 

29 = (2 + 5i)(2 - 5i), 

where all the Gaussian integer factors in brackets above 
are prime in the ring of Gaussian integers. 

Let us say that an odd prime p splits in R- 1 if it 
factorizes into a product of at least two primes and 
remains prime if it does not do so. As we shall soon 
see, the officially agreed-upon definitions of splitting 
and remaining prime for more general rings of alge- 
braic integers (even ones of the form Rd) are worded 
slightly, but very significantly, differently from the way 
we have just defined these concepts in the ring R- 1 
of Gaussian integers. (Note that we have excluded the 
prime p = 2 from the above dichotomy. This is because 
2 ramifies in R- 1; for a discussion of this concept see 
section 7 below.) In any event, there is an elementary 
computable rule that tells us, for any Rd, which primes 
p split and which remain prime in this agreed sense. 
The rule depends upon the residue of p modulo 4<± 
the reader is invited to guess it for the ring of Gaussian 
integers given the data just displayed above. In general, 
an elementary computable rule that says which primes 
split and which do not in a ring of algebraic integers 
such as Rd is referred to as a splitting law for the ring 
of algebraic integers in question. 

5 The Rings Rd of Quadratic Integers 

There is a very important “symmetry,” or automor- 
phism [1.3 §4.1], defined on the ring R d . It sends Vd to 
- Vd, keeps all ordinary integers fixed, and more gener- 
ally, for rational numbers u and v, it sends a = u+v v'd 
to what we may call its algebraic conjugate a' = u- 
vVd. (The word “algebraic” is to remind you that this 
is not necessarily the same as the complex-conjugate 
symmetry of the complex numbers!) 


You can immediately work out the formulas for this 
algebraic conjugation operation on the fundamental 
quadratic irrationalities ry. if d is not congruent to 1 
modulo 4, then t d = Vd, so obviously r' d = -t<j, while 
if d is congruent to 1 modulo 4, then t d = 5 (1 + Vd) 
and T r d = \ ( 1 - Vd) = 1 - t d- This symmetry a >- a' 
respects all algebraic formulas. For example, to work 
out the algebraic conjugate of a polynomial expression 
like ex /3 + 2y 2 , where a, /(, and y are numbers in Rd, 
you just replace each individual number by its algebraic 
conjugate, obtaining the expression «'/Y + 2 y' 2 . 

The most telling integer quantity attached to a num- 
ber« = x+yTditiRd is its norm N(a), which is defined 
to be the product aa'. This equals x 2 - dy 2 when 
t d = Vdandx 2 +xy-\(d-l)y 2 whenT^ = ^(1 + Vd). 
The norm turns out to be multiplicative, meaning that 
N(oq 6) = N(a)N(P), as you can directly check by mul- 
tiplying out the formula for the norm of each factor and 
comparing with the norm of the product. This gives us 
a useful tactic for trying to factorize algebraic num- 
bers in Rd, and offers criteria for determining whether 
a number a in Rd is a unit, and whether it is prime in 
Rd- In faet, an element a g Rd is a unit if and only if 
N(a) = aa' = ±1; in other words, the units are given 
by the integral solutions to the equations 

X 2 - dY 2 = ±1 (11) 

or 

X 2 +XY- \(d-\)Y 2 = ±1 (12) 

following the two cases. Here is the proof of this. If 
a = x + y Td is a unit in R d , then its reciprocal, f> = 1 / a, 
must also be in Rd, and, of course, we have a/l = 1. 
Applying the norm to both sides of this equation and 
using the multiplicative property discussed above, we 
see that JV(a) and N(P) are reciprocal ordinary inte- 
gers. Therefore, they are either both equal to +1 or 
both equal to -1. This shows that (x,y) is a solution 
to whichever of equation (11) or (12) is appropriate. In 
the other direction, if JV(a) = aa' = ±1, then the recip- 
rocal of a is simply ±a'. This is in Rd so a is indeed a 
unit in Rd- 

These homogeneous quadratic forms, the left-hand 
sides of equations (11) and (12) (which generalize for- 
mulas (3) and (9)), play an important role; let us refer 
to whichever of them is relevant to Rd as the funda- 
mental quadratic form for Rd, and to its discriminant 
D as the fundamental discriminant. (D is equal to d 
if d is congruent to 1 modulo 4 and to 4 d otherwise.) 
When d is negative there are only finitely many units 
(if d < -3 the only ones are ±1) but when d is positive, 
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so that Rd consists entirely of real numbers, there are 
infinitely many. The ones that are greater than 1 are 
powers of a smallest such unit, £d, and this is called 
the fundamental unit. 

For example, when tf = 2 the fundamental unit, £2, 
is 1 + V2, and when d = 5 it is the golden mean, £5 = 
j(l + a/ 5). Since any power of a unit is again a unit, 
we immediately have a machine for producing infinitely 
many units from any single one. For example, taking 
powers of the golden mean, we get 

£5 = ^(1 + a/ 5), £5 = gO 4 - a/5), 

£5 = 2 + a/5, £5 = ^(7 + 3a/5), 

£5 = 2 < 1 1 ’ 5</5), 

all of which are units in R5. The study of these fun- 
damental units was already under way in the twelfth 
century in India, but in general their detailed behavior 
as d varies still holds mysteries for us today. For exam- 
ple, there is a deep theorem of Hua (1942) that tells 
us that £d < (4e 2 d) 2d (for a proof of it along with a 
historical discussion of such estimates, see chapters 3 
and 8 in Narkiewicz (1973)). There are examples of d 
that come close to attaining that bound, but we still 
do not know whether or not there is a positive number 
q and an infmity of square-free d for which £d > d dn . 
(The answer to this question would be yes if, for exam- 
ple, there were an infmity of Rd satisfying the unique 
factorization property! This follows from a famous the- 
orem of Brauer (1947) and Siegel (1935); for a proof of 
the Brauer-Siegel theorem, see theorem 8.2 of chapter 8 
in Narkiewicz (1973) or Lang (1970).) 

6 Binary Quadratic Forms and the 
Unique Factorization Property 

The principle of unique factorization is an all-impor- 
tant faet for the ring of ordinary integers Z. The ques- 
tion of whether this principle does or does not hold 
for a given ring Rd is central to the algebraic num- 
ber theory. There are helpful, analyzable, obstructions 
to the validity of unique factorization in Rd . These 
obstructions, in turn, connect with profound arithmetic 
issues, and have become the focus of important study 
in their own right. One such mode of expressing the 
obstruction to unique factorization is already promi- 
nent in Gauss’s Disquisitiones Arithmeticae (1801), in 
which mueh of the basic theory of Rd was already laid 
down. 

This “obstruction” has to do with how many “essen- 
tially different” binary quadratic forms aX 2 + bXY + 


cY 2 there are with discriminant equal to the fundamen- 
tal discriminant D of Rd- (Recall that the discriminant 
of aX 2 + bXY + cY 2 is b 2 - 4 ae, and that D equals 4 d 
unless d = 1 mod 4, in which case it equals d.) 

In order to define a binary quadratic form aX 2 + 
bXY + cY 2 of discriminant D, what you need to pro- 
vide is simply a triplet of coefficients ( a , b,c) such that 
b 2 - 4ac = D . Given such a form, one can use it to define 
other ones. For example, if we make a small linear 
change of the variables, replacing X by X — Y and keep- 
ing T fixed, then we get a(X - Y) 2 + b(X - Y)Y + cY 2 , 
which simplifies to aX 2 + (b - 2 a)XY + (c -b + a)Y 2 . 
That is, we get a new binary quadratic form whose 
triplet of coefficients is (a, b - 2a, c - b + a) , and which 
(as can easily be checked) has the same discriminant 
D. We can “reverse” this change by replacing X by 
X + Y and keeping Y fixed. If we do this reversal and 
perform the corresponding simplification then we get 
back our original binary quadratic form. Because of this 
reversibility, these two quadratic forms take exaetly 
the same set of integer values as X and Y vary: it is 
therefore reasonable to think of them as equivalent. 

More generally, then, one says that two binary quad- 
ratic forms are equivalent if one can be turned into the 
other (or minus the other) by any “reversible” linear 
change of variables with integer coefficients. That is, 
one chooses integers r, s, u, v such that rv - su = ±1, 
replaces X and Y by the linear combinations X' = 
rX + sY, Y' = uX + vY, and simplifies the resulting 
expression to get a new triplet of coefficients. The con- 
dition rv - s u m ± 1 guarantees that by a similar oper- 
ation we can get back to our original binary quadratic 
form, and also that the new binary quadratic form has 
the same discriminant D as the old one. So when we talk 
of “essentially different” binary quadratic forms of dis- 
criminant D we mean that we cannot turn one into the 
other by this kind of change of variables. 

Here is the surprising obstruction to unique factor- 
ization that Gauss discovered. 

The unique factorization principle is valid in Rd if 
and only if every homogeneous quadratic form aX 2 + 
bXY + cY 2 with discriminant equal to the fundamen- 
tal discriminant ofRd is equivalent to the fundamental 
quadratic form ofRd- 

Furthermore, the collection of inequivalent quadratic 
forms whose discriminant is the fundamental discrim- 
inant of Rd expresses in concrete terms the degree to 
which Rd “enjoys unique factorization.” 



IV. 1 . Algebraic Numbers 


If you have never seen this theory of binary quadratic 
forms before, try your hånd at working with quadratic 
forms in the case where D = -23. The idea is to start 
with some particular quadratic form aX 2 + bXY + c Y 2 
of your choice with discriminant D = b 2 - 4 ac = 
-23. Then, using a sequence of carefully chosen linear 
changes of variables you reduce the size of the coeffi- 
cients a, b, and c until you can go no further. Eventually 
you should end up with one of the two (inequivalent) 
quadratic forms that there are with discriminant -23: 
the fundamental form X 2 + XY + 6 Y 2 , or the form 
2X 2 + XY + 3 Y 2 . For example, can you see that the 
binary quadratic form X 2 + 3 XY + 8Y 2 is equivalent 
to X 2 + XY + 6 Y 2 ? 

This type of exercise offers a small hint of the role 
that the geometry of numbers will play in the even- 
tual theory. As you might expect from the venerability 
of these ideas, elegant streamlined methods have been 
discovered for making such calculations. Nevertheless, 
it is an open secret that any working mathematician, 
contemporary or ancient, engaged in this subject or 
nearby subjects, has done a myriad of straightforward 
simple hånd computations along the lines of the above 
exercise. 

If you try a few examples of this exercise, as I hope 
you do, here is one way of organizing your calcula- 
tions. First, find a simple reversible linear change of 
variables to turn your form into an equivalent one with 
a,b,c ^ 0. (You may also have to multiply the whole 
form by -1.) 

The cleanest way of writing down all binary quadratic 
forms given by triplets (a, b, c) of discriminant -23 is 
to list the triplets in increasing order of b, which will 
now be an odd positive integer. For each value of b you 
can then choose a and c in such a way that their prod- 
uct is \ (b 2 + 23). At this point the aim is to build up a 
repertoire of moves that tend to decrease b (which will 
keep a and c within bounds as well). A big clue, and aid, 
here is that for any pair of relatively prime integers x, y 
if you evaluate your quadratic form aX 2 + bXY +cY 2 at 
(X, Y) = (x,y) toget the integer a! = ax 2 +bxy+cy 2 , 
you can find, for appropriate b' and c', a quadratic form 
a'X 2 + b'XY + c'Y 2 equivalent to yours, with first coef- 
ficient a! . So, one tactic is to look for small integers 
represented by your quadratic form. Also the “exam- 
ple” linear change of variables X ^ X -Y,Y ^ Y will 
lead you to be able to reduce the coefficient b to an inte- 
ger smaller than 2a. Can you check that X 2 + XY + 6 Y 2 
and 2X 2 + XY + 3 Y 2 are inequivalent? 


Now, as we have just discussed, it follows from the 
general theory that R- 23 does not have the unique fac- 
torization property. We can also see this directly. For 
example, 

t -23 ’ t -23 = 2-3, 

and all four of the factors in this equation are irre- 
ducible in R- 23. To be a faithful companion, I should at 
this point give at least a hint at what connection there 
might be between this specific “failure of unique factor- 
ization” and the previous discussion. It may become a 
bit clearer in the next paragraph, but the underlying 
tension in the equation t _ 23 ■ t1 23 = 2 ■ 3 is that all the 
factors in our ring are prime: we are missing any ele- 
ments in our ring R-23 that could factorize it further. 
We lack, for example, elements that play the role of 
the greatest common divisor of factors of this equation. 
The general theory regarding these matters (which we 
are not entering into here, but see euclid’s algorithm 
[III.22]) tells us that what is missing is some element y 
in R- 23 that is both a linear combination of the num- 
bers t _23 and 2 (with coefficients in the ring R-23) and 
also a common divisor of t _23 and 2 in the ring R-23, 
i.e., such that T-23 ly and 2/y are both in R_ 2 3- There is 
no such element, for its norm must divide N(t_ 2 3) = 6 
and N( 2) = 4, and therefore be equal to 2, which can 
easily be shown to be impossible. But we are interested, 
rather, in the phenomenon that inequivalence of certain 
binary quadratic forms will indeed show this, so let us 
go on. 

First, check that any linear combination 

« ■ T-23 + 0 ■ 2 

with a, P elements of R-23 can also be written as 
W-T_23+V-2 , where u and v are ordinary integers. Now 
compute the binary quadratic form given by systemat- 
ically taking the norms of these linear combinations, 
and viewing these norms as functions of the integer 
coefficients u, v: 

N(u ■ T- 23 + v ■ 2) = (t_ 23 m + 2v)(t'_ 23 u + 2v) 

= 6 u 2 + 2 uv + 4v 2 . 

Viewing the u and the v as variables, and dubbing them 
U and V to emphasize their status as variables, we can 
say that the norm quadratic form obtained from the 
collection of linear combinations of T-23 and 2 is 
6 1/ 2 + 2 UV + 4V 2 = 2- (3 U 2 + UV + 2V 2 ). 

Now suppose that, contrary to faet, there were a com- 
mon divisor, y, as above; in particular, the multiples of 
y in the ring R-23 would then be precisely the linear 
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combinations of the numbers T-23 and 2. We would 
then have another way of describing those linear combi- 
nations; namely, for any pair of ordinary integers (u,v) 
there would be a pair of ordinary integers ( r,s ) such 
that 

u ■ t _23 + v ■ 2 = y ■ (rr-23 + s) = ryT-23 + sy. 
Taking norms, as above, we would get 

N(y ■ (rr -23 + s))=N(ryr -23 + sy) 

= lV(y)(6r 2 +rs + s 2 ). 

Again, thinking of r and 5 as variables and renaming 
them R and S we would have the corresponding norm 
quadratic form: 

N(y) ■ (6 R 2 + RS + S 2 ) = 2 ■ (6 R 2 + RS + S 2 ). 

Given the above facts — dependent, of course, on the 
contrary-to-fact hypothesis that there is a y as above— 
the key idea is that there would be linear changes of 
variables from (U,V) to ( R,S ) and back that would 
establish an equivalence between the two quadratic 
forms 2 ■ (3C7 2 + UV + 2V 2 ) and 2 ■ (6 R 2 +RS+S 2 ). But 
these quadratic forms are not equivalent! Their inequiv- 
alence therefore shows that the putative y does not 
exist and factorization in the ring R-23 is not unique. 

7 Class Numbers and the Unique 
Factorization Property 

In the previous section we saw that the collection 
of inequivalent quadratic forms of discriminant equal 
to the fundamental discriminant provides us with an 
obstruction to unique factorization. Somewhat later, 
a more articulated version of this obstruction arose, 
known as the ideal class group Ha of Ra . As its name 
implies, to describe this we must use the vocabulary of 
ideals [III.83 §2] and groups [1.3 §2.1]. A subset I of 
Ra is an ideal if it has the following closure properties: 
if a belongs to I, so do -a and ra<x, and if a and /? 
belong to I, so does a + fi. (The first and third prop- 
erties imply together that any integer combination of 
a and /( belongs to I.) The basic example of such an 
ideal is the set of all multiples of some fixed, nonzero 
element y of Ra, where by a multiple of y we mean the 
product of y and an element of Ra- We denote this set 
tersely as (y), or, slightly more expressively, as y ■ Ra- 
An ideal of this sort, i.e., one that can be expressed as 
the set of all multiples of a single nonzero element y, is 
called a principal ideal. For example, the ring Ra itself 
is an ideal (it consists, after all, of all linear combina- 
tions of 1 and Ta) and is even a principal ideal: in our 


laconicterminology.itcanbedenoted (1) = 1 Ra = Ra- 
Strictly speaking, the singleton {0} is also an ideal, but 
the ones that will interest us are the nonzero ideals. 

As a direct counterpart to the obstruction principle 
involving binary quadratic forms that was described in 
the previous section, we have the following obstruction 
principle involving ideals. 

The unique factorization principle is valid in Ra if and 
only if every ideal in Ra is principal. 

Reflecting on this, you can get a sense of why the word 
“ideal” might have been chosen. Every principal ideal 
in Ra is of the form y ■ Ra for some number y in Ra 
(which is uniquely determined apart from multiplica- 
tion by units), but sometimes there are more general 
ideals. These arise if you ever have two elements of Ra 
(think of t _23 and 2, as in the previous section) such 
that the set of all their integer combinations cannot be 
expressed as the set of multiples of some fixed num- 
ber y in Ra- This phenomenon is a sign that we may be 
missing numbers in Ra that provide fine enough factor- 
izations to make the arithmetic in Ra as smooth going 
as one might hope for. Just as a principal ideal y ■ Ra 
corresponds to the number y, ideals of this more gen- 
eral kind (think of the set of all integer combinations 
of t _23 and 2) can be thought of as corresponding to 
“ideal numbers” that should, “by rights,” be present in 
our ring, but happen not to be. 

Once we think of ideals as standing for ideal num- 
bers it makes some sense to try to multiply them: if I, 
J are two ideals in Ra, we let I ■ J denote the set of all 
finite sums of products tx ■ P in which a is in I and P 
is in J. The product of two principal ideals (yi) ■ (y2 ) 
is the principal ideal (yi ■ y2) so, just as one would 
hope, multiplication of principal ideals corresponds to 
multiplication of the corresponding numbers. Multipli- 
cation of any ideal I by the ideal ( 1 ) leaves I unchanged: 
(1) ■ I = T, we therefore refer to the ideal (1) as the unit 
ideal. With this new notion of multiplication o f ideals we 
can now give the general definition of what it means for 
a prime number p to split or to remain prime in a ring 
Ra, the definition we promised in section 4. 

The idea behind the definition is to use multiplication 
of ideals rather than of numbers. So if we are think- 
ing about a prime p, the first thing we do is turn our 
attention to the principal ideal (p) in Ra- If this can 
be factorized as a product of two different ideals ( not 
necessarily principal ideals, this is the whole point) in 
Ra, and if neither of these is the unit ideal (1) = Ra, 
then we say that p splits in Ra- If, on the other hånd, 
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no factorization of the ideal (p) can be made without 
one of the factors being the ideal (1) = Rd, then we 
say that p remains prime in Rd- There is also a third 
important definition: if the principal ideal (p) can be 
expressed as the square of another ideal I, then we say 
that p ramifies in Rd- Continuing with the momentum 
of this definition, we may say that an ideal P is a prime 
ideal if P cannot be “factorized” as the product of two 
ideals neither of which is the unit ideal. This defini- 
tion makes sense whether or not P is principal, so we 
are subtly shifting our attention from the multiplicative 
arithmetic of the numbers in Rd to the ideals. 

By definition, two ideals are in the same ideal class 
if when you multiply each by an appropriate principal 
ideal you get the same ideal as a result. This is a nat- 
ural equivalence relation [1.2 §2.3] on ideals. It is 
also one that respects products, meaning that if I and 
J are two ideals, then the ideal class of their product 
I ■ J depends only on the ideal classes of I and J. (In 
other words, if I' is in the same ideal class as I and 
J' is in the same ideal class as then I' ■ J' is in the 
same ideal class as I ■ J.) We can therefore say what we 
mean by multiplication o f ideal classes: to multiply two 
classes, pick an ideal from each, multiply those, and 
take the ideal class of the resulting product. The set 
Ha of ideal classes of Rd, given this operation of multi- 
plication, forms an Abelian group, in the sense that the 
multiplication law we have just defined is associative 
and commutative, and there are inverses. The identity 
element is the principal ideal Rd itself. This group Hd, 
the ideal class group, directly measures the extern to 
which the ideals of the ring Rd are principal: roughly 
speaking it is what you get if you take the multiplicative 
structure of all ideals and “divide out” by the principal 

As was mentioned in section 6, there is a close con- 
nection between ideal classes and binary quadratic 
forms. To begin to see this, take an ideal I of Rd and 
write it as the set of all integer combinations of two 
elements a, ot Rd- Then consider the norm function 
on the elements of /, that is, 

N(xcx + yP) = (x<x + ypHxa' +yP') 

= aa'x 2 + (ofjS' + (x'p)xy + PP'y 2 . 
This is a binary quadratic form in the variable coeffi- 
cients x and y. If you start with a different choice of a, 
P that generate I you get a different form, but the two 
forms are scalar multiples of two forms with discrimi- 
nant D that are equivalent to one another. Even better, 


the equivalence class of these forms depends only on 
the ideal class of I. 

It can be shown that there are only a fimte number of 
distinet ideal classes of Rd', that is, the ideal class group 
Hd is finite. The number of its elements is denoted hd 
and called the class number of Rd- So, the obstruction 
to unique factorization of Rd is given by the nontriviaT 
ity of the group Hd', equivalently, unique factorization 
holds for Rd if and only if its class number is 1. But 
whether or not Hd is trivial, its detailed group-theoretic 
structure is profoundly related to the arithmetic of Ru- 

The class number enters into the generalizations of 
formulas (5) and (10) of section 1; that is, the analytic 
formulas we alluded to in that section. These formu- 
las represent just the beginning of one of the ongoing 
chapters of our subject, and form a bridge between the 
world of discrete arithmetical issues and that of calcu- 
lus, infinite series, and volumes of spaces, all of which 
can be attacked by the methods of complex analysis 
[1.3 §5.6]. Here is a sample of them. 


(i) If d > 0 is a square-free integer and D is either 
d or 4 d according to whether d is congruent to 1 
modulo 4 or not, then 


log Sd 


- pv 


where the integers n run through those that are 
relatively prime to D and the signs ± are chosen 
in a way that depends only on the residue class of 
n modulo D. 

(ii) If d < 0 we have a somewhat simpler formula: 
there is no fundamental unit Sd in Rd to contend 
with, but when d = - 1 or -3, there are more roots 
of unity than merely ± 1 . If Wd denotes the number 
of roots of unity in Rd, then W - 1 = 4, tp_ 3 = 6 and 
otherwise Wd = 2, and then one has a formula of 
the following type: 

_ y ( 1 

n t 0 “n 


As d tends to -oo the class number hd tends to 
infmity. 


We have effeetive lower bounds for the growth of hd 
but these lower bounds are probably still far from the 
actual growth (cf. Goldfeld 1985). The effeetive lower 
bounds that are known are exceedingly weak. They fol- 
low, however, from beautiful work of Goldfield, and 
of Gross and Zagier: for every real number r < 1 
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there is a computable constant C(r) such that ha > 
C(r) log \D\ r . Here is a sample: 



if (D, 5077) = 1. 

It is a striking lacuna in our theory that, even today, 
nobody knows how to prove that there are infinitely 
many values of d > 0 for which Ra enjoys the unique 
factorization property — particularly since we expect 
that more than three quarters of them do! Our expec- 
tations are even more precise than that, thanks to 
Henri Cohen and Hendrik Lenstra, who make use of 
certain probabilistic expectations (now known as the 
Cohen-Lenstra heuristics) to conjecture that the density 
of positive fundamental discriminants of class num- 
ber 1 among all positive fundamental discriminants is 
0.75446.... 

8 The Elliptic Modular Function and 
the Unique Factorization Property 

A different obstruction to unique factorization in Ra is 
available when d is negative. Now Ra may be thought 
of as a lattice in the complex plane (see figure 3), which 
makes a wonderful tool available for us: the classical 
elliptic modular function of klein [VI. 5 7], 
j(z) = e~ 2nLz + 744 + 196 884 e 2niz 

+ 21493 760e 47Tiz + 864 299 970 e 6niz + ■ ■ ■ . 

(13) 

This function, also colloquially referred to as the 
“ /-function,” converges for complex numbers z = x + 
ry with y > 0. If z = x + ry and z' = x' + ry' are two 
such complex numbers, then j(z) = j(z' ) if andonlyif 
the lattice generated by z and 1 in the complex plane is 
the same as the lattice generated by z' and 1 (or, equiv- 
alently, z' = (az + b)/(cz + d), where a, b, c, and d 
are ordinary integers such that ad-bc = 1). We can 
paraphrase this by saying that the value j(z) depends 
only on, and characterizes, the lattice generated by z 
and 1. 

It turns out (by a theorem of Schneider) that if an 
algebraic number a = x + iy with y > 0 has the prop- 
erty that j(a) is also algebraic, then a is a (complex) 
quadratic irrationality; and the converse is also true. In 
particular, since a = ra is such a complex quadratic 
irrationality when d is negative, the value, j(j d ), of the 
j-function on t a is an algebraic number — in faet, an 
algebraic integer. This will be of some importance for 


our story. First, since the ring Ra as situated in the com- 
plex plane is simply the lattice generated by t a and 1, 
it follows from the previous paragraph that this value 
j(ra) will be the same if we replace t a by any element 
a of Ra, as long as the lattice generated by « and 1 is the 
entire ring Ra . More importantly, jija) is an algebraic 
integer of degree roughly comparable with the class 
number of Ra- In particular, it is an ordinary integer 
if and only if the ring Ra has the unique factorization 
property. (This result is one of the great applications 
of a classical theory known as complex multiplication.) 
In brief, here is yet another answer to the question of 
when the unique factorization principle holds for Ra 
when d is negative: if j(. Ta) is an ordinary integer, the 
answer is yes; otherwise it is no. 

The search for the full list of negative values of d 
for which Ra has the unique factorization property 
makes a marvelous tale: there are precisely nine val- 
ues of d for which it occurs (see below), but for over 
two decades number theorists, while knowing these 
nine, could prove only that there were no more than 
ten. The history of how the nonexistence of a possible 
tenth value of d was established, and reestablished, is 
one of the thrilling chapters in our subject. K. Heeg- 
ner, in an article published in 1934, provided what he 
claimed was a proof of the nonexistence of the possible 
tenth value ofd. However, Heegner’s proof was framed 
in somewhat unfamiliar language and was not under- 
stood by the mathematicians of the time. His paper 
and his purported proof were largely forgotten until 
the late 1960s, when the nonexistence of the tenth 
held was established (to the mathematical community’s 
satisfaction) by Stark (1967) and independently, via a 
different method, by Baker (1971). It was only then 
that mathematicians took a second and doser look at 
Heegner’s original article and discovered that he had 
indeed proven exaetly what he claimed. Moreover, his 
proof offered an elegant direct conceptual road to an 
understanding of the underlying issue. 

Here are the nine values of d: 
d = - 1, -2, -3, -7, -11, -19, -43, -67, -163. 
And here are the corresponding nine values of j(ra): 
j( r d ) = 2 6 3 3 , 2 6 5 3 , 0, — 3 3 5 3 , -2 15 , -2 15 3 3 , 

- 2 18 3 3 5 3 , -2 15 3 3 5 3 11 3 , -2 18 3 3 5 3 23 3 29 3 . 

As Stark once pointed out, if, for some of these val- 
ues of d, you simply “plug” r d into the power series 
expansion for j, you get rather surprising formulas. For 
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example, when d = -163, then 

g-2mT d _ gTT % /I63 

is the first term of the power series for y ( t_ i g3 ) (see 
formula (13)). Since j(t_i 6 3) = -2 18 3 3 5 3 23 3 29 3 and 
since all the terms e 2nnTd ( n > 0) that appear in the 
power series for the j-function are relatively small, we 
find that e 77 '' 163 is incredibly close to an integer. Indeed, 
it is 2 18 3 3 5 3 23 3 29 3 + 744 + ■ ■ ■ , which works out as 
262 537412 640 768 744 - e, where the error term e is 
less than 7.5 x 10 -13 . 

9 Representations of Prime Numbers 
by Binary Quadratic Forms 

More often than you might expect, it turns out to be 
possible to translate difficult and/or somewhat artifi- 
cial problems about ordinary integers into natural and 
tractable problems about larger rings of algebraic inte- 
gers. My favorite elementary example of this type is the 
theorem due to fermat [VI. 12] that if a prime number p 
may be expressed as a sum of two squares, p = a 2 + b 2 
with 0 < « 0, then it has only one such expression. 
(For example, l 2 + 10 2 is the only way of expressing the 
prime number 101 as the sum of two squares.) More- 
over, a prime number p can be expressed as a sum of 
two squares if and only if p = 2 or p is of the form 
4k + 1. (The “only if” part of this is easy to see: since 
any square is congruent either to 0 or to 1 mod 4, an 
odd integer that is a sum of two squares is necessarily 
congruent to 1 mod 4.) These statements about ordi- 
nary integers can be translated into basic statements 
about the ring of Gaussian integers. For if we write 
a 2 + b 2 = (a + i b)(a - ih), with i 4 V 1 , then we can 
view a 2 + b 2 as the norm of the (conjugate) elements 
a ± Ib in the ring of Gaussian integers. So, if p is a 
prime number that admits an expression as a sum of 
squares, p = a 2 + b 2 , it follows that each of the ele- 
ments a±ib has norm a prime integer. It is easy to 
deduce that p is itself a prime in the ring of Gauss- 
ian integers. Indeed, any factorization of a ± ih into a 
product of two Gaussian integers would have the prop- 
erty that the norms of the factors are ordinary integers 
which mul tiply out to be the prime p, and this severely 
limits their possibilities: one of them has to be a unit. 

In other words, whenever p = a 2 + h 2 , then 
p = (a + i h)(a - ih) 

is a factorization of the ordinary integer prime p into 
a product of two Gaussian integer primes. The unique- 
ness part of Fermat’s theorem then follows from (in 


faet, it is readily seen to be equivalent to) the unique 
factorization property of the ring R- 1 of Gaussian inte- 
gers. That any prime number p of the form 4k + 1 
admits such an expression as a sum of two squares 
follows from the splitting iaw for primes p in the ring 
of Gaussian integers: an odd prime number p is a 
norm, and hence splits into the product of two dis- 
tinet primes, in the ring of Gaussian integers if and 
only if p is congruent to 1 mod 4. This result is just 
the beginning of an immense chapter of arithmetic. 

10 Splitting Laws and the Race 
between Residues and Nonresidues 

The simple splitting law for ordinary prime integers p 
in the ring of Gaussian integers, which States that p 
splits if p = 1 mod 4 and not if p s' : • I mod 4, invites 
us to ask how often each of these cases occurs (see fig- 
ure 4). dirichlet [VI. 3 6] proved a famous theorem that 
says that there are infmitely many primes in the arith- 
metic progression c, m + c, 2 m + c,... if the integers 
m and c are relatively prime. A more precise version of 
his result gives a clear asymptotic answer to the ques- 
tion we have just asked: as x goes to infinity, the ratio 
of the number of primes less than x that split to the 
number that do not tends to 1. (See analytic number 
theory [IV.2 §4] for a further discussion of Dirichlet’s 
theorem.) 

For fun, one might ask a fussier question: which 
type of prime less than x is actually in greater abun- 
dance, the nonsplit primes or the split ones (see fig- 
ure 4)? To put some perspective on this, let us widen 
our query: for g equal either to 4 or to an odd prime, 
let A(x) be the number of primes i < x that are quad- 
ratic residues modulo g and let B(x) be the number 
of primes i < x that are quadratic nonresidues mod- 
ulo g. Let D(x) = A(x) - B(x) be the difference; what 
does D(x) looklike? 

For an absorbing account of the history and status 
of this problem, see the article “Prime number races” 
by Andrew Granville and Greg Martin in American 
Mathematical Monthly. 

1 1 Algebraic Numbers and Algebraic Integers 

Now that we have seen the algebraic integers j(t<j) for 
negative values of d, and have touched on trigonomet- 
ric sums, we have a few hints that, as with ordinary inte- 
gers, the deep structure of these rings of quadratic inte- 
gers may be better understood within a larger context 
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Figure 4 The higher of the two graphs in the figure repre- 
sents the number of primes less than X that remam prime 
in the ring of Gaussian integers, and the lower represents 
the number of primes less than X that split in the ring 
of Gaussian integers. The third graph hovering around the 
x-axis represents the difference between the two numbers. 
We thank William Stein for this data. 

of algebraic numbers. So now let us deal with algebraic 
numbers in full generality. 

By a monic polynomial, we mean a polynomial of the 
form 

P(X) =X n + a\X n ~ x + ■ ■ ■ + a„-iX + a n , 
i.e., a polynomial of degree n such that the coefficient 
of X n is 1. In general, the other coefficients are just 
assumed to be complex numbers. If P(X) = X n + 
aiX n ~ x + ■ ■ ■ + a„-\ X + a n is such a polynomial, and 
if 0 is a complex number such that P(0) = 0, or, 
equivalently, if 0 satisfies the polynomial equation 
0" + aiØ n_1 + • • ■ + a n - 10 + a n = 0, 
we say that 0 is a root of the polynomial P(X). the 
FUNDAMENTAL THEOREM OF ALGEBRA [V.15], initially 
proved by Gauss, guarantees that any such polyno- 
mial of degree n factors into a product of n linear 
polynomials. That is, 

P(X) = (X-Øi)(X-0 2 ) ■■■(*-©„) 
for some complex numbers Øi, 0 2 , . . . , Ø n that are in 
faet precisely the roots of the polynomial P(X). 

If 0 is a root of such a polynomial P(X) = X n + 
a\X n ~ x + ■ ■ ■ + a n -iX + a n and if in addition the coeffi- 


cients at are rational numbers, then 0 is called an alge- 
braic number. If the coefficients are not just rational 
but are in faet integers, then 0 is called an algebraic 
integer. So, for example, the square root of any rational 
number is an algebraic number and the square root of 
any “ordinary” integer is an algebraic integer. The same 
holds true for nth roots of ordinary integers, or of alge- 
braic integers, for any natural number n. For an exam- 
ple of a different sort, we have already mentioned the 
theorem that the values of the /-funetion on complex 
quadratic irrational integers are algebraic integers. For 
a (random) particular case of that theorem, the complex 
number j( t_ 2 3) is a root of the monic polynomial 

X 3 + 3 491 7SOX 2 - 5151296 875X 

+ 12771880859375. 

An exercise: show that any algebraic number can be 
expressed as an algebraic integer divided by an ordi- 
nary integer. 


12 Presentation of Algebraic Numbers 


In dealing with any mathematical concept, we confront, 
in one way or another, the dual problem of the various 
forms in which it comes to us when it arises in our 
work, and the various ways we can present it so as to 
deal with it effeetively. We have already seen a bit of 
this at the outset of this article, in our discussion of 
quadratic surds, and we will continue to see it in our 
treatment of them below, where the various modes in 
which quadratic surds canbe presented— as radicals, as 
eventually recurrent continued fractions, or as trigono- 
metric sums — come together, all contributing to their 
unified theory. 

This issue of presentation is all the more of a problem 
with algebraic numbers in general, which may come to 
us in a multitude of ways. For example, they can arise as 
the coordinates of points on specific algebraic varieties 
whose defining equations may not be easily available, 
or as special values of funetions like the ./-funetion. It 
is natural, then, to look for some uniform way of pre- 
senting algebraic numbers, and the history of the sub- 
ject shows how mueh effort has been devoted to such 
a search. For example, consider the focus on iterated 
radical expressions, as in the famous formula for the 
solution to the general cubic equation X 3 = bX + c 
given by 
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or the corresponding general solution to the fourth- 
degree equation. These were major achievements of six- 
teenth-century Italian algebra, and they culminated in 
the proof that the general fifth-degree algebraic num- 
ber could not be so expressed, which was a major 
achievement of the early nineteenth century (see the 
insolubility of the quintic [V.24]). The challenge 
to give some analytic expression for such fifth-degree 
algebraic numbers was the source of a classic book by 
Klein, The Icosahedron, written in the late nineteenth 
century. Kronecker wrote that it was the “dream of his 
youth” (his Jugendtraum) to establish a uniform mode 
of presentation for a class of algebraic numbers that 
interested hun, by expressing them as values of certain 
analytic functions. 

13 RootsofUnity 

A central role in the theory of algebraic numbers is 
playedby the roots o f unity, that is, the n complex solu- 
tions of the equation X n = 1, or equivalently the n 
roots of the polynomial X n - 1. If we let = e 27Ti/re , 
then these roots are precisely £ ra and its powers, so in 
particular they are algebraic integers. They give us the 
factorization 

X n - 1 = (X - l)(X - £„)(* - c£) ■ ■ ■ (X - C -1 ). 
Now the powers of form the vertices of a regular n- 
gon in the complex plane, centered at the origin. This 
has the following consequence, noticed by Gauss in 
his youth. It can be shown that compass and straight- 
edge constructions allow us, in effect, to extract square 
roots, so whenever can be given as an expression 
built out of just square roots and the usual arithmeti- 
cal operations, we have, implicitly, a ruler-and-compass 
construction of the regular n- gon, and conversely. 

To get some idea of why square roots are so closely 
connected with these constructions, consider this. If we 
have given ourselves a unit measure, which we can view 
as the distance between the numbers 0 and 1 in the 
(complex) plane, and if we have already constructed, 
by whatever device, a specific point, x say, between 0 
and 1 on the horizontal axis of the plane, we can first 
“construct” x 12 by straightedge and compass, and then 
go on to form a right-angled triangle with hypotenuse 
of length 1 + x/2 and one of its other sides of length 
1 - x/2 (again using a straightedge and compass). The 
Pythagorean theorem gives us that the third side of 
that triangle is of length ^fx. If one follows this line of 
thought (but adapts it to deal with complex quantities 


as well as the real number x as in the example we have 
just discussed), then one can see that the equations 
& = ^'{1 + i V 3- ) > 

?4 = Vi, 

Z5 = k(^-D+i 1 s{^ + VS), 

£e = -l«+iV3) 

provide (implicit) constructions of the equilateral tri- 
angle, the square, the regular pentagon, and the reg- 
ular hexagon, respectively. By contrast, X ,7 cannot be 
expressed solely in terms of the arithmetical operations 
and square roots (it is the root of a quadratic equation 
with coefficients that are rational expressions in the 
roots of the irreducible cubic polynomial X 3 - \x + 
/ 7 ), which already suggests that the regular heptagon 
might fail to be constructible by the standard classi- 
cal means— and indeed it does fail without some act of 
“angle trisection.” (In principle, though, the reader can 
work out an expression for £7 in terms of square roots 
and cube roots by means of the information provided in 
the parenthetical phrase above, together with equation 
(14).) 

Gauss showed that if n > 2 is a prime number then 
the regular n-gon is classically constructible if and only 
if n is a Fermat prime, that is, a prime number of the 
form 2 2 “ + 1. So, for example, the 11-gon and 13-gon 
are not constructible by classical means, but since £17 
is expressible as nested rational expressions of square 
roots, the 17-gonis, famously, constructible. 

So, not all roots of unity can be expressed as iter- 
ated rational expressions of square roots. However, this 
inhospitability is not mutual, since all square roots of 
integers can be expressed as integer combinations of 
roots of unity. More mysteriously, the elusive funda- 
mental units Ed (for d positive), for which there is no 
known formula, are intimately related to a unit c d in 
Rd which is an explicit rational expression of roots 
of unity. (See below: it is called a circular unit.) This 
satisfies the elegant formula 

Cd = e h d \ (15) 

which establishes yet another explicit test of unique 
factorization: the equality c d = £ d is a “litmus” require- 
ment for the unique factorization principle to hold in 
Rd- 

To give the flavor of the formulas involved, let p be 
an odd prime number and let a be an integer not divis- 
ible by p. Then define cr p (a) to be +1 if a is a quad- 
ratic residue modulo p, that is, if a is congruent to 
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the square of an integer modulo p, and -1 if not. The 
simple trigonometric sums of ( 1 ) and ( 6 ) generalize to 
quadratic Gauss sums: 

±i <P-%VP = t,v + <>>(2)Cp + (TpfiKl + ■■■ 

+ a p (p - 2)X%~ 2 + a p (p - 1 )Cp _1 - 
(16) 

This formula is not too hard to prove, apart from deter- 
mining which sign is correct in the initial ±, but after 
considerable efforts Gauss managed to work this out 
too. To see the connection between, say, formula ( 6 ) 
and (16) note that when p = 5, the left-hand side of 
(16) is y'T and the right-hand side is 

6 + ~tl ~ ts 2 + Cs" 1 = 2 cos f tt - 2 cos In. 

As for the circular unit c p , it is defined to be 

(p 1)1? (p-D/2 

n (t P ~ t P a ) <Tp{a) = n sin(T Ta/p)^^, 

and this leads to further formulas. For example, when 
p = 5, we have e p = T 5 = |(1 + J5), and since hs = 1, 
formula ( 6 ) for p = 5 tells us that 

1 + V5 = g:, ~ g 5 -1 = 

2 ti~t s 2 sin ^ Ti’ 

14 The Degree of an Algebraic Number 

If 0 is an algebraic integer that is also a rational num- 
ber, then 0 is an “ordinary” integer. Here is the proof 
of this faet. If 0 is a rational number, then we may 
write 0 = C/D as a fraction in lowest terms. If 0 
is also an algebraic integer, then it is the root of a 
monic polynomial with rational integer coefficients, 
Ø n + ai @ n_1 + ■ ■ ■ + a n , so we have an equation 
(C/D) n + aiiC/D)”- 1 + ■ ■ ■ + a n -i(C/D) + a n = 0. 
Multiplying through by D n we get 

C n + aiC n-1 D + ■ ■ ■ + an-iCD”- 1 + a n D n = 0, 
where all terms are (ordinary) integers, and all but the 
first one is divisible by D. If D > 1 then it has some 
prime factor p, so all terms apart from the first are also 
divisible by p. Since the terms add up to zero, it follows 
that p divides C”, which implies that p divides C, which 
contradicts the assertion that the fraction C/D is in its 
lowest terms. This in turn contradicts the hypothesis 
that 0 can be expressed as a ratio of whole numbers in 
the first place. As the reader may like to verify, this faet 
implies the result attributed to Theaetetus above, that 
■JA is irrational if and only if A is not a perfeet square. 


The degree of an algebraic number 0 is defined to 
be the smallest degree, n, of any polynomial relation 
Ø n + ai<9 ” -1 + ■ ■ ■ + a n - iØ + a n = 0 that 0 satisfies, 
where the coefficients cn are rational numbers. The cor- 
responding polynomial, P(X) = X n + a\X n ~ l + ■ ■ ■ + 
a n -iX + dn is unique, since if there were two of them 
then their difference would be of smaller degree and 
would also have 0 as a root. (One could make it monic 
by dividing it through by the leading coefficient.) Let 
us call P(X) the minimal polynomial of 0. The mini- 
mal polynomial is irreducible over the held of rational 
numbers: that is, it cannot be factored as a product 
of two polynomials, each of smaller degree and hav- 
ing rational numbers as coefficients. (If it could, then 
it would not be of minimal degree, since one of its fac- 
tors would have 0 as a root.) The minimal polynomial 
P(X ) of 0 is a factor of any monic polynomial G(X) 
with rational coefficients that has 0 as root. (The great- 
est common divisor of P and G is another monic poly- 
nomial with rational coefficients that has 0 as a root, 
so it cannot be of degree smaller than that of P and it 
must therefore be P.) The minimal polynomial P(X) of 
0 has distinet roots. (If P{X) had multiple roots, then 
a little elementary calculus shows that it would share 
a nontrivial factor with its derivative, P' (X). Since the 
derivative is of lower degree than P(X) and again has 
rational coefficients, the greatest common divisor of P 
and P' would provide a nontrivial factorization of P(X), 
contradicting its irreducibility.) 

A fundamental result due to Gauss is that the nth 
root of unity g ra = e 27Ti/n is an algebraic integer of 
degree precisely 4>(n), where </> is Euler’s r/yfunetion. 
For example, if p is prime, the minimal polynomial of 
tv is 

= XP- 1 + XP~ 2 + ---+X+1, 
which is of degree 4>{p) = p - 1 . 

15 Algebraic Numbers as Ciphers Determined 
by Their Minimal Polynomials 

We have expressly insisted that our algebraic numbers 
are complex numbers (of a certain sort). But another 
possible attitude toward an algebraic number, 0, an 
attitude at times promoted by Kronecker, among oth- 
ers, is to deal with 0 as an unknown satisfying only the 
algebraic relations implied by the faet that it is a root 
of its (unique monic) minimal polynomial with rational 
coefficients. For example, if the minimal polynomial of 
0 is P(X) = X 3 - X - 1, then, according to this view, 
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0 is just an algebraic symbol that comes with the rule 
that any occurrence of Ø 3 may be replaced by 0 + 1 
(rather as the complex number i can be regarded as 
a symbol with the property that i 2 may be replaced 
by -1). Any root of the minimal polynomial of 0 sat- 
isfies all the same polynomial relations with rational 
coefficients that 0 satisfies; these roots are called con- 
jugates of 0. If 0 is an algebraic number of degree n, 
then 0 has n distinet conjugates, all of them again, of 
course, algebraic numbers. 

16 A Few Remarks about the 
Theory of Polynomials 

Central to the theory of polynomials in one variable- 
and, therefore, particularly to the theory of algebraic 
numbers— is the general relationship that roots have 
to coefficients: 

\\{X - Ti) =X n + \ {—l)i Aj(Ti, T 2 , ..., T n )X n ~i. 

i= i J=o 

The polynomial Aj(T\, T2 , . . . , T n ) is homogeneous of 
degree j (this means that every monomial in it has total 
degree j), has integer coefficients, and is symmetric in 
(i.e., unchanged by any permutation of) the variables 
Ti, T 2 , ..., T n . 

The constant term is the product of the roots: 

A n (T 1 ,T 2 ,...,T n ) = Ti ■ T 2 T n , 

which is known as the norm form. The coefficient of 
X n_1 is the sum of the roots: 

Al (Ti, r 2 , . . . , T n ) = Ti + r 2 + ■ ■ ■ + T n , 
and this is the trace form. 

When n = 2 the norm and trace are all the symmetric 
polynomials in the list. For n = 3, beyond the norm and 
trace we also have the symmetric polynomial of degree 

A 2 (Ti, T 2 , T 3 ) = TiT 2 + T 2 T 3 + T 3 Ti 

= |((Tj + r 2 + r 3 ) 2 - (T 2 + r| + T 2 )}. 
It is of major importance to this theory, and more 
specifically to galois theory [V.24], that the symme- 
try properties of the conjugate roots are nicely reflected 
in these symmetric polynomials. In particular, we have 
the fundamental result that any symmetric polyno- 
mial in Ti, T 2 , ...,T n with rational coefficients can be 
expressed as a polynomial with rational coefficients in 
the symmetric polynomials Aj(Ti, T 2 , . . . , T n ), and sim- 
ilarly with integral coefficients. For example, the equa- 
tion above shows that T 2 + T| + T| can be expressed 


as 

Ai(Ti,T 2 ,r 3 ) 2 -2A 2 (Ti,r 2 ,r 3 ). 


1 7 Fields of Algebraic Numbers 
and Rings of Algebraic Integers 

The inverse of a nonzero algebraic number is again an 
algebraic number; the sum, difference, and product of 
two algebraic numbers are algebraic numbers; the sum, 
difference, and product of two algebraic integers are 
algebraic integers. The neat proofs of these (latter) facts 
are a good demonstration of the power of linear alge- 
bra, and in particular of Cramer’s rule. This States that 
any matrix with integer coefficients (and therefore also 
any linear transformation of a finite-dimensional vec- 
tor space that preserves an integer lattice) satisfies a 
monic polynomial identity with integer coefficients. 

To see just how useful this remark is for finding poly- 
nomial relations, and more specifically for showing that 
the collections of algebraic numbers and algebraic inte- 
gers are closed under sums and products, try your hånd 
at showing that y/2 + ^/3 is an algebraic integer. One 
way to do it is to search for the monic fourth-degree 
polynomial equation that it satisfies. But this is hardly a 
beautiful calculation! If, however, you are familiar with 
linear algebra, then a less painful route is to form the 
four-dimensional vector space over the rational num- 
bers, generatedby 1, y/l, V3, and 76 (which are linearly 
independent when the scalars are rational). Multiplica- 
tion by y'2 + y'3 defines a linear transformation T of 
this vector space, and one can compute its character- 
istic polynomial P. The Cayley-Hamilton theorem says 
that P(T) = 0, and this translates into the statement 
that y/l + V3 is a root of P. 

These “closure properties” we have just discussed 
lead us to study, in complete generality, fields of alge- 
braic numbers and rings of algebraic integers. A num- 
ber field is a field that is generated (as a Held) by ftnitely 
many algebraic numbers. A standard result tells us that 
any number field K can in faet be generated by a sin- 
gle carefully chosen algebraic number. The degree of 
this algebraic number equals the degree of K, which is 
defined to be the dimension of K when K is viewed as a 
vector space over the field Q of rational numbers. One 
of the main introductory observations of Galois theory 
is that if K is a number field of degree n, then there 
are exaetly n distinet ring homomorphisms (“imbed- 
dings”) 1 : K — C from K into the field of complex 
numbers. (This means that t sends 1 to 1 and respects 
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the addition and multiplication laws within K. That is, 
i(x + y ) = i(x) + i(y) and i(x ■ y ) = i(x) ■ i(y ).) From 
these imbeddings, we can construct some very useful 
rational-valued functions on K. For any element x in K, 
we form the n complex numbers xi , X2 , . . . , x n that are 
the images of x under the n different imbeddings of K 
into C. We then let 

a/(x) = Aj(x i,x 2 ,...,x n ), 

where Aj(X],X '>, . . . ,X n ) is the jth symmetric polyno- 
mial of section 14 above. (Because the polynomials Aj 
are symmetric, we do not have to worry about the order 
of the images xi , X2 , . . . , x n in the above expression.) It 
is not immediately obvious that the values of aj are 
rational numbers, but there is a theorem that tells us 
this. 

If an algebraic number 0 in K generates K (as a 
held), then the rational numbers a,j(0) are the coef- 
ficients of its minimal polynomial; in general they are 
the coefhcients of a power of its minimal polynomial. 
The most prominent of these functions are the multi- 

plicative function a n (x) = x\ ■ x-i x n , called the 

norm function, usually denoted x i-» Nk/q(x), and the 
additive function a 1 (x) = xi + X2 + ■ ■ ■ + x„, called the 
trace function, usually denoted x •- tracejj/Q, (x). 

The trace function can be used to define a fundamen- 
tal symmetric bilinear form on the Q-vector space K, 

(x,y) = tracex/m(x ■ y), 

which turns out to be nondegenerate. This nondegener- 
acy, together with the faet that if x, y are both algebraic 
integers, then (x, y) is an ordinary integer, can be used 
to show that the ring O (K) of all algebraic integers in K 
is finitely generated as an additive group. More specih- 
cally, there is a basis of algebraic integers in K, that is, 
a finite set {©1 , @2 , . . . , Ø n } , such that any other alge- 
braic integer in K can be expressed as an “ordinary” 
integer combination of the numbers Øj. 

Let us summarize this structure. The number held 
K is a flnite-dimensional vector space over Q and 
comes equipped with a nondegenerate bilinear sym- 
metric form ( x,y ) • • (x,y), and also with a lattice 
O {K ) c K. Moreover, the restriction of the bilinear form 
to O(K) takes on integral values. 

The discriminant of K, denoted D (K ) , is defined to be 
the determinant [III. 15] of the matrix whose (./-entry 
is (Øj, Øj), for {Øi, 02,..., Ø n } a basis of the lattice 
0(K)\ this determinant does not depend on the basis 
chosen. 


The discriminant represents important information 
about the number held K. For one thing, there is a nat- 
ural generalization to any number held of the notions 
of splitting and ramification that we discussed for quad- 
ratic helds, and the prime divisors p of D(K) are pre- 
cisely those prime numbers that ramify in the held 
extension K. By a theorem of minkowski [VI.64], the 
absolute value of the discriminant D (K) of a number 
held K of degree n is always greater than 



This is greater than 1 unless K is the held of rational 
numbers. It follows that any nontrivial extension of the 
held of rational numbers has some prime that ramihes 
in it, a result that would be very hard to prove with- 
out the help of the algebraic structures we have just 
dehned. This integer D(K) really is quite a discrimi- 
nating “tag” for our number held K, for, by a theorem 
of hermite [VI.47], given any integer D there are only 
Hnitely many different number helds with discriminant 
equal to D. (Not all integers can be discriminants: as is 
true for quadratic number helds, the integers D that are 
discriminants are either divisible by 4 or else congruent 
to 1 modulo 4.) 

18 On the Size(s) of the Absolute Values of 
All Conjugates of an Algebraic Integer 

As we have just seen, the coefhcients of the minimal 
polynomial for an algebraic integer 0 are given by the 
ordinary integers aj(Øi, 02, . . . , Ø n ), where the num- 
bers Øf are all the conjugates of 0. The sizes of all these 
coefhcients must therefore allbe less than some univer- 
sal number M that depends only on the degree of 0 and 
the largest absolute value of any of its conjugates. As 
a consequence, given any n and any positive number 
B, there are only hnitely many algebraic integers 0 of 
degree less than n such that the absolute values of 0 
and its conjugates are all less than B. (This is because 
for any n and M there are only hnitely many polyno- 
mials of degree less than or equal to n with the absolute 
values of all their integer coefhcients at most M.) This 
Hniteness result is the key to the following observation, 
due to Kronecker: if 0 is an algebraic number and if 
the absolute values of 0 and of all of its conjugates are 
equal to 1, then 0 is a root of unity. Indeed, all the pow- 
ers of 0 have degree at most that of 0, and they enjoy 
the same property: their absolute value, and that of all 
their conjugates, is equal to 1. Consequently, there are 
only finitely many such algebraic numbers, from which 
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it follows that there must be at least one coincidence 
of the form Ø a = Ø b for different a and b. But this can 
happen only if 0 is a root of unity. 

19 Weil Numbers 

To follow this thread for just a bit, let us generalize 
the hypothesis of Kronecker’s observation, and define 
a Weil number 3 of absolute value r to be a nonzero 
algebraic integer such that it and all of its conjugates 
have the same absolute value r. By the discussion in 
the previous section there are only finitely many dis- 
tinet Weil numbers of given degree and absolute value. 
By Kronecker’s theorem, which we have just described, 
the Weil numbers of absolute value 1 are precisely the 
roots of unity. Here are further basic facts that you 
might try to prove. First, the quadratic Weil numbers 
to are precisely those quadratic algebraic integers such 
that | trace ( oo ) | < 2VlN(co)| = 2Vlcoco'[, where co' is 
the (algebraic) conjugate of to. Second, if p is prime 
then a quadratic Weil number co of absolute value yp is 
a prime element of the (unique) ring of quadratic inte- 
gers Rd that contains co, and therefore gives a prime 
factorization coco' = ±p of the integer p in that ring. 

Weil numbers of absolute value p v/2 , where p is 
again a prime number and v is a natural number, are 
extremely important in arithmetic: they hold the key 
to counting numbers of rational solutions of systems 
of polynomial equations over finite helds. For just one 
concrete example, the Gaussian integer co = -1 + i and 
its algebraic conjugate (which, in this instance, is also 
its complex conjugate) co = - 1 - i are Weil numbers (of 
absolute value 2) that control the number of solutions 
of the equation y 2 - y = x 3 - x over all finite helds of 
size a power of 2. SpecihcaUy, the number of solutions 
of that equation over a held of order 2 V is given by the 
formula 

2 V — (—1 — i) v — (—1 + i) v 

(which is an ordinary integer). This leads to another 
immense chapter of mathematics. 

20 Epilogue 

The single symmetry a a', the algebraic conjuga- 
tion in the rings Rd that we have discussed, gave birth, 
thanks to abel [VI.33] and galois [VI.41] in the begin- 
ning of the nineteenth century, to the rich study of 


3, This Is a weaker condition than is usually required for Well num- 
bers but our deviation from standard usage should not be the cause 
of too mueh confusion. 


(Galois) groups of symmetries of general number helds 
(see THE INSOLUBILITY OF THE QUINTIC [V.24]). This 
study continues with great intensity, since these Galois 
groups and their linear representations hold the key 
to a very detailed understanding of number helds. In 
its modern dress, algebraic number theory is closely 
connected with what is often called arithmetic geom- 
etry [IV. 5]. Kronecker’s dream of getting explicit Con- 
trol of a wealth of algebraic number theoretic material 
by expressing algebraic numbers in terms of natural 
analytic funetions has not yet been fully realized. Nev- 
ertheless, the scope of this dream (and, one might also 
add, the supply of natural analytic and algebraic fune- 
tions) has expanded substantiahy: the full range of alge- 
braic geometry and group representation theory is now 
being brought to bear on it. This is done, for example, 
by the Langlands program, which among other things 
works with objects known as Shimura varieties. On the 
one hånd, these varieties have close connections with 
the theory of group representations and classical alge- 
braic geometry, which greatly helps us to understand 
them. On the other hånd, they are a rich source of con- 
crete linear representations of Galois groups of number 
helds. This program, one of the glories of current math- 
ematics, will, I expect, make a terrihc chapter for a Com- 
panion to Mathematics to be written at the beginning of 
the next century. 
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IV.2 Analytic Number Theory 

Andrew Granville 


1 Introduction 

What is number theory? One might have thought that it 
was simply the study of numbers, but that is too broad 
a definition, since numbers are almost ubiquitous in 
mathematics. To see what distinguishes number theory 
from the rest of mathematics, let us look at the equa- 
tion x 2 + y 2 = 15 925, and consider whether it has any 
solutions. One answer is that it certainly does: indeed, 
the solution set forms a circle of radius y" l 5 925 in the 
plane. However, a number theorist is interested in inte- 
ger solutions, and now it is much less obvious whether 
any such solutions exist. 

A useful first step in considering the above question 
is to notice that 15 925 is a multiple of 25: in faet, it is 
25 x 637. Furthermore, the number 637 can be decom- 
posed further: it is 49x13. That is, 15925 = 5 2 x7 2 xl3. 
This information helps us a lot, because if we can find 
integers a and b such that a 2 + b 2 = 13, then we can 
multiply them by 5 x 7 = 35 and we will have a solution 
to the original equation. Now we notice that a = 2 and 
b = 3 works, since 2 2 + 3 2 = 13. Multiplying these num- 
bers by 35, we obtain the solution 70 2 + 105 2 = 15 925 
to the original equation. 


As this simple example shows, it is often useful to 
decompose positive integers multiplicatively into com- 
ponents that cannot be broken down any further. These 
components are called prime numbers, and the fun- 
damental theorem of ARiTHMETic [V.16] States that 
every positive integer can be written as a product of 
primes in exaetly one way. That is, there is a one-to-one 
correspondence between positive integers and finite 
Products of primes. In many situations we know what 
we need to know about a positive integer once we have 
decomposed it into its prime factors and understood 
those, just as we can understand a lot about molecules 
by studying the atoms of which they are composed. For 
example, it is known that the equation x 2 +y 2 = n has 
an integer solution if and only if every prime of the form 
4m+ 3 occurs an even number of times in the prime fac- 
torization of n. (This tells us, for instance, that there are 
no integer solutions to the equation x 2 + y 2 = 13 475, 
since 13475 = 5 2 x 7 2 x 11, and 11 appears an odd 
number of times in this product.) 

Once one begins the process of determining which 
integers are primes and which are not, it is soon appar- 
ent that there are many primes. However, as one goes 
further and further, the primes seem to consist of a 
smaller and smaller proportion of the positive integers. 
They also seem to come in a somewhat irregular pat- 
tern, which raises the question of whether there is any 
formula that describes all of them. Failing that, can one 
perhaps describe a large class of them? We can also ask 
whether there are infinitely many primes. If there are, 
can we quickly determine how many there are up to 
a given point? Or at least give a good estimate for this 
number? Finally, when one has spent long enough look- 
ing for primes, one cannot help but ask whether there 
is a quick way of recognizing them. This last question is 
discussed in computational number theory [IV.3]; 
the rest motivate the present article. 

Now that we have discussed what marks number 
theory out from the rest of mathematics, we are ready 
to make a further distinetion: between algebraic and 
analytic number theory. The main difference is that 
in algebraic number theory (which is the main topic 
of algebraic numbers [IV.l]) one typically considers 
questions with answers that are given by exact formu- 
las, whereas in analytic number theory, the topic of 
this article, one looks for good approximations. For the 
sort of quantity that one estimates in analytic num- 
ber theory, one does not expect an exact formula to 
exist, except perhaps one of a rather artificial and unil- 
luminating kind. One of the best examples of such a 
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quantity is one we shall discuss in detail: the number 
of primes less than or equal to x. 

Since we are discussing approximations, we will need 
terminology that allows us to give some idea of the 
quality of an approximation. Suppose, for example, that 
we have a rather erratic function f(x) but are able to 
show that, once x is large enough, /(x) is never big- 
ger than 25x 2 . This is useful because we understand 
the function g(x) = x 2 quite well. In general, if we 
can find a constant c such that |/(x)| ^ cg{x) for 
everyx, thenwe write/(x) = 0(g(x)). A typical usage 
occurs in the sentence “the average number of prime 
factors of an integer up to x is log log x + 0(1)”; in 
other words, there exists some constant c > O such 
that | the average - log log x| < c once x is sufficiently 
large. 

Wewrite/(x) ~ g(x) if lim x ^ oc ,f(x)/g(x) = 1; and 
also/(x) « g(x) when we are being a little less precise, 
that is, when we want to say that /(x) and g(x) come 
close when x is sufficiently large, but we cannot be, or 
do not want to be, more specific about what we mean 
by “come close.” 

It is convenient for us to use the notation X for sums 
and n for product. Typically we will indicate beneath 
the symbol what terms the sum, or product, is to be 
taken over. For example, Xm>2 will be a sum over all 
integers m that are greater than or equal to 2, whereas 
llp prime will be a product over all primes p. 

2 Bounds for the Number of Primes 

Ancient Greek mathematicians knew that there were 
infinitely many primes. Their beautiful proof by con- 
tradiction goes as follows. Suppose that there are only 
finitely many primes, say k of them, which we will 

denote by pi, p2 Pk- What are the prime factors of 

P1P2 ■ ■ ■ Pk + 1? Since this number is greater than 1 it 
must have at least one prime factor, and this must be 
pj for some j (since all primes are contained among 
pi, p2, ■ ■ ■ , Pk)- But then pj divides both pip 2 ■ ■ ■ pk 
and pi P2 ■ ■ ■ Pfc+1 , and hence their difference, 1, which 
is impossible. 

Many people dislike this proof, since it does not actu- 
ally exhibit infinitely many primes: it merely shows that 
there cannot be finitely many. It is more or less possi- 
ble to correct this deficiency by defining the sequence 
xi = 2, x 2 = 3, and xjt+i = xix 2 ■ ■ ■ xjt + 1 for each 
k ^ 2. Then each Xk must contain at least one prime 
factor, qi- say, and these prime factors must be distinet, 
since if k <$, then q^ divides Xfc which divides X( - 1 , 


while qe divides X(. This gives us an infinite sequence 
of primes. 

In the eighteenth century euler [VI. 19] gave a dif- 
ferent proof that there are infinitely many primes, one 
that turned out to be highly influential in what was 
to come later. Suppose again that the list of primes 
is Pi, p2, ■ ■ ■ , Pk- As we have mentioned, the funda- 
mental theorem of arithmetic implies that there is a 
one-to-one correspondence between the set of all inte- 
gers and the set of products of the primes, which, if 
those are the only primes, is the set !p‘j l1 p 2 2 ■ ■ ■ p£ l : 
a i , a 2 , . . . , ak ^ 0} . But, as Euler observed, this implies 
that a sum involving the elements of the first set should 
equal the analogous sum involving the elements of the 
second set: 



The last equality holds because each sum in the second- 
last line is the sum of a geometric progression. Euler 
then noted that if we take 5=1, the right-hand side 
equals some rational number (since each pj > 1) 
whereas the left-hand side equals oo. This is a contra- 
diction, so there cannot be finitely many primes. (To see 
why the left-hand side is infinite when 5 = 1, note that 
(l/n) % /ra +1 (l/t) dt since the function l/t is decreas- 
ing, and therefore Xn=i O /n) p Jj v (l /t) dt = log N 
which tends to oo as N — oo.) 

During the proof above, we gave a formula for X n~ s 
under the false assumption that there are only finitely 
many primes. To correct it, all we have to do is rewrite 
it in the obvious way without that assumption: 


X 


i- n 


n a positive Integer 

Now, however, we need to be a little careful about 
whether the two sides of the formula converge. It is 
safe to write down such a formula when both sides are 
absolutely convergent, and this is true when 5 > 1. (An 
infinite sum or product is absolutely convergent if the 
value does not change when we take the terms in any 
order we want.) 
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Like Euler, we want to be able to interpret what hap- 
pens to (1) when 5 = 1. Since both sides converge and 
are equal when 5 > 1, the natural thing to do is con- 
sider their common limit as 5 tends to 1 from above. 
To do this we note, as above, that the left-hand side of 
(1) is well approximated by 


” dt 


1 


so it diverges as 5 — 1 + . We deduce that 

„IL HH 


Upon taking logarithms and discarding negligible 
terms, this implies that 


I 

p prime 



V 


(3) 


So how numerous are the primes? One way to get an 
idea is to determine the behavior of the sum analo- 
gous to (3) for other sequences of integers. For instance, 
Zn^i 1/tt 2 converges, so the primes are, in this sense, 
more numerous than the squares. This argument works 
if we replace the power 2 by any 5 > 1, since then, as 
we have just observed, the sum Xn>i 1 /tt s is about 
1/(5 - 1) and in particular converges. In faet, since 
Zn^i l/tt(logn) 2 converges, we see that the primes 
are in the same sense more numerous than the num- 
bers {n(logn) 2 : n ^ 1}, and hence there are infimtely 
many integers x for which the number of primes less 
than or equal to x is at least x/(logx) 2 . 

Thus, there seem to be primes in abundance, but 
we would also like to verify our observations, made 
from calculations, that the primes constitute a smaller 
and smaller proportion of the integers as the integers 
become larger and larger. The easiest way to see this is 
to try to count the primes using the “sieve of Eratos- 
thenes.” In the sieve of Eratosthenes one starts with all 
the positive integers up to some number x. From these, 
one deletes the numbers 4, 6, 8 and so on— that is, all 
multiples of 2 apart from 2 itself. One then takes the 
first undeleted integer greater than 2, which is 3, and 
deletes all its multiples — again, not including the num- 
ber 3 itself. Then one removes all multiples of 5 apart 
from 5, and so on. By the end of this process, one is left 
with the primes up to x. 

This suggests a way to guess at how many there are. 
After deleting every second integer up to x other than 
2 (which we call “sieving by 2”) one is left with roughly 
half the integers up to x; after sieving by 3, one is left 
with roughly two thirds of those that had remained; 


continuing like this we expect to have about 


integers left by the time we have sieved with all the 
primes up to y. Once y = *Jx the undeleted integers 
are 1 and the primes up to x, since every composite 
has a prime factor no bigger than its square root. So, is 
(4) a good approximation for the number of primes up 
to x when y = .Jxl 

To answer this question, we need to be more precise 
about what the formula in (4) is estimating. It is sup- 
posed to approximate the number of integers up to x 
that have no prime factors less than or equal to y, plus 
the number of primes up to y. The so-called inclusion- 
exclusion principle can be used to show that the approx- 
imation given in (4) is accurate to within 2 k , where k is 
the number of primes less than or equal to y. Unless k 
is very small, this error term of 2 k is far larger than the 
quantity we are trying to estimate, and the approxima- 
tion is useless. It is quite good if k is less than a small 
constant times logx, but, as we have seen, this is far 
less than the number of primes we expect up to y if 
y ~ -Jx. Thus it is not clear whether (4) can be used 
to obtain a good estimate for the number of primes up 
to x. What we can do, however, is use this argument to 
give an upper bound for the number of primes up to 
x, since the number of primes up to x is never more 
than the number of integers up to x that are free of 
prime factors less than or equal to y, plus the number 
of primes up to y, which is no more than 2 k plus the 
expression in (4). 

Now, by (2), we know that as y gets larger and larger 
the product ]lp<j>(l - l/p) converges to zero. There- 
fore, for any small positive number e we can find a 
y such that (1 - 1 lp) < f/2. Since every term in 

this product is at least 1/2, the product is at least l/2 k . 
Hence, for any x > 2 2k our error term, 2 k , is no bigger 
than the quantity in (4), and therefore the number of 
primes up to x is no larger than twice (4), which, by our 
choice of y, is less than ex. Since we were free to make 
e as small as we liked, the primes are indeed a vanishing 
proportion of all the integers, as we predicted. 

Even though the error term in the inclusion-exclu- 
sion principle is too large for us to use that method to 
estimate (4) when y = vx, we can still hope that (4) is 
a good approximation for the number of primes up to 
x: perhaps a different argument would give us a mueh 
smaller error term. And this turns out to be the case: 
in faet, the error never gets mueh bigger than (4). How- 
ever, when y = .Jx the number of primes up to x is 
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actually about 8/9 times (4). So why does (4) not give 
a good approximation? After sieving with prime p we 
supposed that roughly 1 in every p of the remaining 
integers were deleted: a careful analysis yields that this 
can be justified when p is small, but that this becomes 
an increasingly poor approximation of what really hap- 
pens for larger p; in faet (4) does not give a correct 
approximation once y is bigger than a fixed power of x. 
So what goes wrong? In the hope that the proportion 
is roughly l/p lies the unspoken assumption that the 
consequences of sieving by p are independent of what 
happened with the primes smaller than p. But if the 
primes under consideration are no longer small, then 
this assumption is false. This is one of the main rea- 
sons that it is hard to estimate the number of primes 
up to x, and indeed similar difficulties lie at the heart 
of many related problems. 

One can refine the bounds given above but they do 
not seem to yield an asymptotic estimate for the primes 
(that is, an estimate which is correct to within a fac- 
tor that tends to 1 as x gets large). The first good 
guesses for such an estimate emerged at the begin- 
ning of the nineteenth century, none better than what 
emerges from an observation of gauss [VI. 26], made 
when studying tables of primes up to three million 
at sixteen years of age, that “the density of primes at 
around x is about 1 / logx.” Interpreting this, we guess 
that the number of primes up to x is about 

y — L_ « f* 

^logn )2 logt 

Let us compare this prediction (rounded to the nearest 
integer) with the latest data on numbers of primes, dis- 
covered by a mixture of ingenuity and computational 
power. Table 1 shows the actual numbers of primes 
up to various powers of 10 toge ther with the differ- 
ence between these numbers and what Gauss’s formula 
gives. The differences are far smaller than the num- 
bers themselves, so his prediction is amazingly accu- 
rate. It does seem always to be an overcount, but since 
the width of the last column is about half that of the 
central one it appears that the difference is something 
like y'x. 

In the 1930s, the great probability theorist, Cramér, 
gave a probabilistic way of interpreting Gauss’s predic- 
tion. We can represent the primes as a sequence of Os 
and ls: Putting a “1” each time we encounter a prime, 
and a “0” otherwise, we obtain, starting from 3, the 

sequence 1,0, 1,0, 1, 0, 0, 0, 1, 0, 1, Cramér’s idea is 

to suppose that this sequence, which represents the 


Table 1 Primes up to various x, and 
the overcount in Gauss’s prediction. 


x 

tt(x) = #{primes ^ x} 

Overcount: 
f x dt 

1 t-TTW 

J 2 logt 

10 8 

5 761455 

753 

10 9 

50847534 

1700 

10 10 

455052511 

3103 

10 11 

4118054813 

11587 

10 12 

37607912018 

38 262 

10 13 

346065536839 

108 970 

10 14 

3 204 941750802 

314 889 

10 15 

29 844 570422 669 

1052 618 

10 16 

279 238 341033 925 

3214631 

10 17 

2623557157654233 

7956 588 

10 18 

24 739 954 287 740860 

21949 554 

10 19 

234057667276344 607 

99877 774 

10 2 ° 

2 220819 602 560918840 

222 744 643 

10 21 

21 127269486018731928 

597 394 253 

10 22 

201 467 286 689 315 906 290 

1932355207 


primes, has the same properties as a “typical” sequence 
of Os and ls, and to use this principle to make pre- 
cise conjectures about the primes. More precisely, let 
X3 , X) , . . . be an infinite sequence of random vari- 
ables [III. 73 §4] taking the values 0 or 1, and let the 
variable X n equal 1 with probability 1 / log n (so that 
it equals 0 with probability 1 - 1/ log n). Assume also 
that the variables are independent, so for each m know- 
ledge about the variables other than X m tells us noth- 
ing about X m itself. Cramér’s suggestion was that any 
statement about the distribution of 1 s in the sequence 
that represents the primes will be true if and only if it is 
true with probability 1 for his random sequences. Some 
care is needed in interpreting this statement: for exam- 
ple, with probability 1 a random sequence will contain 
infinitely many even numbers. However, it is possible 
to formulate a general principle that takes account of 
such examples. 

Here is an example of a use of the Gauss-Cramér 
model. With the help of the central limit theorem 
[IH. 73 §5] one can prove that, with probability 1, there 
are 

IT ifj +o< ^ io s*) 

ls among the first x terms in our sequence. The model 
tells us that the same should be true of the sequence 
representing primes, and so we predict that 

#{primes up to x} = f + O(Vxlogx), (5) 
J 2 log t 
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just as the table suggests. 

The Gauss-Cramér model provides a beautiful way 
to think about distribution questions concerning the 
prime numbers, but it does not give proofs, and it does 
not seem likely that it can be made into such a tool; 
so for proofs we must look elsewhere. In analytic num- 
ber theory one attempts to count objects that appear 
naturally in arithmetic, yet which resist being counted 
easily. So far, our discussion of the primes has con- 
centrated on upper and lower bounds that follow from 
their basic definition and a few elementary properties— 
notably the fundamental theorem of arithmetic. Some 
of these bounds are good and some not so good. To 
improve on these bounds we shall do something that 
seems unnatural at first, and reformulate our question 
as a question about complex functions. This will allow 
us to draw on deep tools from analysis. 

3 The “Analysis” in Analytic Number Theory 


complex, as long as its real part is greater than 1, so we 
have 

If we take the logarithm of both sides and then differ- 
entiate, we obtain the equation 

C'U) y log V y Y log v 

£(*) P p^ne ^- 1 pprimemtl P"“ ' 

We need some way to distinguish between primes p ^ 
x and primes p > x; that is, we want to count those 
primes p for which x lp ^ 1, but not those with x/p < 
1. This can be done using the step function that takes 
the value 0 for y < 1 and the value 1 for y > 1 (so 
that its graph looks like a step). At y = 1, the point of 
discontinuity, it is convenient to give the function the 
average value, \ ■ Perron’s formula, one of the big tools 
of analytic number theory. describes this step function 
by an integral, as follows. For any c > 0, 


These analytic techniques were born in an 1859 memoir 
of riemann [VT.49], in which he looked at the function 
that appears in the formula (1) of Euler, but with one 
crucial difference: now he considered complex values 
of s. To be precise, he defined what we now call the 
Riemann zeta function as follows: 

It can be shown quite easily that this sum converges 
whenever the real part of 5 is greater than 1, as we 
have already seen in the case of real 5. Flowever, one of 
the great advantages of allowing complex values of 5 is 
that the resulting function is holomorphic [1.3 §5.6], 
and we can use a process of analytic continuation to 
make sense of £(s) for every 5 apart from 1. (A simi- 
lar but more elementary example of this phenomenon 
is the infinite series Zn^o 2 ”- which converges if and 
only if \z\ < 1. However, when it does converge, it 
equals 1/(1 - z), and this formula defines a holomor- 
phic function that is defined everywhere except z = L) 
Riemann proved the remarkable faet that conftrming 
Gauss’s conjecture for the number of primes up to x 
is equivalent to gaining a good understanding of the 
zeros of the function £(5), that is, of the values of 5 
for which £(s) = 0. Riemann’s deep work gave birth to 
our subject, so it seems worthwhile to at least sketch 
the key steps in the argument linking these seemingly 
unconnected topics. 

Riemann’s starting point was Euler’s formula (1). It is 
not hard to prove that this formula is valid when s is 


2M = 

2m J 5: Re(s)=c 5 


if 0 < y < 1 , 
if y = 1, 
if y > 1. 


The integral is a path integral along a vertical line in the 
complex plane: the line consisting of all points c + it 
with i£l. We apply Perron’s formula with y = x/p m , 
so that we count the term corresponding to p m when 
p m < x, but not when p m > x. To avoid the “j,” 
assume that x is not a prime power. In that case we 
obtain 


X lo SP 


1 r M 

2rri Js:R e (s)=< 


£(*) 


- d5. 


(6) 


We can justify swapping the order of the sum and 
the integral if c is taken large enough, since every- 
thing then converges absolutely. Now the left-hand side 
of the above equation is not counting the number of 
primes up to x but rather a “weighted” version: for each 
prime p we add a weight of log p to the count. It turns 
out, though, that Gauss’s prediction for the number of 
primes up to x follows so long as we can show that 
x is a good estimate for this weighted count when x 
is large. Notice that the sum in (6) is exaetly the loga- 
rithm of the lowest common multiple of the integers 
less than or equal to x, which perhaps explains why 
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this weighted counting function for the primes is a nat- 
ural function to consider. Another explanation is that 
if the density of primes near p is indeed about 1/ log p, 
then multiplying by a weight of log p makes the density 
everywhere about 1. 

If you know some complex analysis, then you will 
know that Cauchy’s residue theorem allows one to eval- 
uate the integral in (6) in terms of the “residues” of 
the integrand (£'(5)/£(5))(x 5 /s), that is, the poles of 
this function. Moreover, for any function / that is ana- 
lytic except perhaps at fmitely many points, the poles 
of /' ( s)/f(s ) are the zeros and poles of /. Each pole of 
f'(s)/f(s) has order 1, and the residue is simply the 
order of the corresponding zero, or minus the order of 
the corresponding pole, of /. Using these facts we can 
obtain the explicit formula 


X l °SP 

p prime, m^l 


X 

pX(p )= 0 


C'(0) 

C(0| : : 


(7) 


Here the zeros of £(s) are counted with multiplicity: 
that is, if p is a zero of £(s) of order k, then there are k 
terms for p in the sum. It is astonishing that there can 
be such a formula, an exact expression for the number 
of primes up to x in terms of the zeros of a complicated 
function: you can see why Riemann’s work stretched 
people’s imagination and had such an impact. 

Riemannmade another surprising observation which 
allows us to easily determine the values of £(s) on the 
left-hand side of the complex plane (where the function 
is not naturally defined). The idea is to multiply £(s) by 
some simple function so that the resulting product g (s) 
satisfies the functional equation 

§(5) = §(1-5) for all 5. (8) 

He determined that this can be done by taking §(5) = 
§5(5 - 1)t T~ s,2 r(^s)Z(s). Here r(s) is the famous 
gamma function [111.31], which equals the factorial 
function at positive integers (that is, T(n) = (n - 1)!), 
and is well-defmed and continuous for all other s. 

A careful analysis of (1) reveals that there are no 
zeros of £(s) with Re(s) > 1. Then, with the help of 
(8), we can deduce that the only zeros of £(s) with 
Re(5) < 0 lie at the negative even integers -2, -4, . . . 
(the “trivial zeros”). So, to be able to use (7), we need to 
determine the zeros inside the critical strip, the set of 
all 5 such that 0 < Re(s) < 1. Here Riemann made yet 
another extraordinary observation which, if true, would 
allow us tremendous insight into virtually every aspect 
of the distribution of primes. 


The Riemann hypothesis. If 0 ^ Re(s) ^ 1 and £(s) = 
0, thenRe(s) = 


It is known that there are infimtely many zeros on the 
line Re(s) = \, crowding doser and doser together as 
we go up the line. The Riemann hypothesis has been 
verified computationally for the ten billion zeros of 
lowest height (that is, with |Im(5)| smallest), it can be 
shown to hold for at least 40% of all zeros, and it fits 
nicely with many different heuristic assertions about 
the distribution of primes and other sequences. Yet, for 
all that, it remains an unproved hypothesis, perhaps the 
most famous and tantalizing in all of mathematics. 

How did Riemann think of his “hypothesis”? Rie- 
mann’s memoir gives no hint as to how he came up 
with such an extraordinary conjecture, and for a long 
time afterwards it was held up as an example of the 
great heights to which humankind could ascend by 
pure thought alone. However, in the 1920s Siegel and 
weil [VI.93] got hold of Riemann’s unpublished notes 
and from these it is evident that Riemann had been 
able to determine the lowest few zeros to several dec- 
imal places through extensive hånd calculations — so 
much for “pure thought alone”! Nevertheless, the Rie- 
mann hypothesis is a mammoth leap of imagination 
and to have come up with an algorithm to calculate 
zeros of £ (5) is a remarkable achievement. (See com- 
putational number theory [IV.3] for a discussion of 
how zeros of £(s) canbe calculated.) 

If the Riemann hypothesis is true, then it is not hard 
to prove the bound 


I — I < 
I pr 

Inserting this into (7) one 

X lo 8 P = 


x m 
Im(p)| ' 

:an deduce that 
: + 0(Vxlog 2 x). 


This, in turn, can be “translated” into (5). In faet these 
estimates hold if and only if the Riemann hypothesis is 
true. 

The Riemann hypothesis is not an easy thing to 
understand, nor to fully appreciate. The equivalent, (5), 
is perhaps easier. Another version, which I prefer, is 
that, for every N > 100, 


I log(lcm[l, 2, . . . , Al]) - JV| < VN (log Al) 2 . 


To focus on the overcount in Gauss’s guesstimate for 
the number of primes up to x, we use the following 
approximation, which can be deduced from (7) if, and 
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only if, the Riemann hypothesis is true: 

J* (1/logt) dt - #{primes ^ x} 

Vx/logx 

X Sin(yl ° gX) . dO) 

all real numbers y>0 y 

such that j +i y 
isazeroof £(*) 

The right-hand side here is the overcount in Gauss’s 
prediction for the number of primes up to x, divided 
by something that grows like Jx. When we looked at 
the table of primes it seemed that this quantity should 
be roughly constant. However, that is not quite true as 
we see upon examining the right-hand side. The first 
term on the right-hand side, the “1,” corresponds to 
the contribution of the squares of the primes in (7). 
The subsequent terms correspond to the terms involv- 
ing the zeros of g (s) in (7); these terms have denom- 
inator y so the most significant terms in this sum are 
those with the smallest values of y. Moreover, each of 
these terms is a sine wave, which oscillates, half the 
time positive and half the time negative. Having the 
“logx” in there means that these oscillations happen 
slowly (which is why we hardly notice them in the table 
above), but they do happen, and indeed the quantity in 
(10) does eventually get negative. No one has yet deter- 
mined a value of x for which this is negative (that is, a 
value of x for which there are more than J* ( 1 / log t ) dt 
primes up to x), though our hest guess is that the first 
time this happens is for 

x » 1.398 x 10 316 . 

How does one arrive at such a guess given that the table 
of primes extends only up to 10 22 ? One begins by using 
the first thousand terms of the right-hand side of (10) 
to approximate the left-hand side; wherever it looks 
as though it could be negative, one approximates with 
more terms, maybe a million, until one becomes pretty 
certain that the value is indeed negative. 

It is not uncommon to try to understand a given func- 
tion better by representing it as a sum of sines and 
cosines like this; indeed this is how one studies the 
harmonics in music, and (10) becomes quite compelling 
from this perspective. Some experts suggest that (10) 
tells us that “the primes have music in them” and 
thus makes the Riemann hypothesis believable, even 
de sir able. 

To prove unconditionally that 

#{primes^x}~j*^, 


the so-called prime number theorem, we can take the 
same approach as above but, since we are not asking for 
such a strong approximation to the number of primes 
up to x, we need to show only that the zeros near to 
the line Re(s) = 1 do not contribute much to the for- 
mula (7). By the end of the nineteenth century this task 
had been reduced to showing that there are no zeros 
actually onthelineRe(s) = 1: this was eventually estab- 
lished by de la vallée poussin [VI.67] and hadamard 
[VI.65] in 1896. 

Subsequent research has provided wider and wider 
subregions of the critical strip without zeros of £(s) 
(and thus improved approximations to the number of 
primes up to x), without coming anywhere near to 
proving the Riemann hypothesis. This remains as an 
outstanding open problem of mathematics. 

A simple question like “How many printes are there 
up to x?” deserves a simple answer, one that uses ele- 
mentary methods rather than all of these methods of 
complex analysis, which seem far from the question at 
hånd. However, (7) tells us that the prime number the- 
orem is true if and only if there are no zeros of £(s) 
on the line Re(s) = 1, and so one might argue that it 
is inevitable that complex analysis must be involved in 
such a proof. In 1949 Selberg and Erdos surprised the 
mathematical world by giving an elementary proof of 
the prime number theorem. Here, the word “elemen- 
tary” does not mean “easy” but merely that the proof 
does not use advanced tools such as complex analysis— 
in faet, their argument is a complicated one. Of course 
their proof must somehow show that there is no zero 
on the line Re(.s) = 1, and indeed their combinator- 
ics cunningly masks a subtle complex analysis proof 
beneath the surface (read Ingham’s discussion (1949) 
for a careful examination of the argument). 

4 Primes in Arithmetic Progressions 

After giving good estimates for the number of primes 
up to x, which from now on we shall denote by tt(x), 
we might ask for the number of such primes that are 
congruent to a mod q. (If you do not know what this 
means, see modular arithmetic [III.60].) Let us write 
Tr(x;q,a) for this quantity. To start with, note that 
there is only one prime congruent to 2 mod 4, and 
indeed there can be no more than one prime in any 
arithmetic a,a + q,a + 2q , ... if a and q have a com- 
mon factor greater than 1. Let 4>iq) denote the number 
of integers a, 1 ^ a ^ q, such that (a, q) = 1. (The 
notation (a,cf) stands for the highest common factor 
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of a and q.) Then all but a small finite number of the 
infinitely many primes belong to the r/hq) arithmetic 
progressions a,a + q,a + 2q, . . . with 1 ^ a < q and 
(a,q) = 1. Calculation reveals that the primes seem to 
be pretty evenly split between these r/hq) arithmetic 
progressions, so we might guess that in the limit the 
proportion of primes in each of them is 1 /</>(q). That 
is, whenever ( a,q ) = 1, we might conjecture that, as 


rr(x;q,a) 


tt(x) 


( 11 ) 


4>(q) ' 

It is far from obvious even that the number of primes 
congruent to a mod q is infinite. This is a famous the- 
orem of dirichlet [VI.36]. To begin to consider such 
questions we need a systematic way to identify integers 
n that are congruent to a mod q, and this Dirichlet pro- 
vided by introducing a class of functions now known 
as (Dirichlet) characters. Formally, a character mod q 
is a function x from Z to C with the following three 
properties (in ascending order of interest): 


(i) x(ti) = 0 whenever n and q have a common factor 
greater than 1; 

(ii) x is periodic mod q (that is, x(n + q) = x(tt) for 
every integer n); 

(iii) x is multiplicative (that is, x(mw) = x(*n)x(n) for 
any two integers m and n). 


An easy but important example of a character mod q 
is the principal character Xq, which takes the value 1 if 
puk aithough (n, q) = 1 and 0 otherwise. If q is prime, then another 
*ong W her°°ti^ ns important example is the Legendre symbol ( -): one sets 
commcm‘ :r ’ s (") to be 0 if n is a multiple of q, 1 if n is a quadratic 
suggest^change re sidue mod q, and -1 if n is a quadratic nonre sidue 
Satth S to 16 ’ '° mocl d- (An integer n is called a quadratic residue mod q 
iTåooK now 6 ' n ^ n is congruent mod q to a perfect square.) If q is com- 
posite, then a function known as the Legendre-Jacobi 
symbol (-), which generalizes the Legendre symbol, 
is also a character. This too is an important example 
that helps us, in a slightly less direct way, to recognize 
squares mod q. 

These characters are all real-valued, which is the 
exception rather than the rule. Here is an example 
of a genuinely complex-valued character in the case 
q = 5. Set x(«) to be 0 if to-s 0 (mod 5), i if toJSj 
2, -1 if n. = 4, -i if to = 3, and 1 if to = 1. To 
see that this is a character, note that the powers of 
2 mod 5 are 2, 4, 3, 1, 2, 4, 3, 1, ... , while the powers of 

i are i, -1, -i, 1,1, -1, -i, 1 

It can be shown that there are precisely <f> (q) distinet 
characters mod q. Their usefulness to us comes from 


the properties above, together with the following for- 
mula, in which the sum is over all characters mod q 
and x(a) denotes the complex conjugate of x(«): 


<Mq) 


XxMx(n) 


= 1 if 

“jo ot 


n = a (modq), 
otherwise. 


What is this formula doing for us? Well, understand- 
ing the set of integers congruent to a mod q is equiva- 
lent to understanding the function that takes the value 
1 if n = a (mod q) and 0 otherwise. This function 
appears on the right-hand side of the formula. How- 
ever, it is not a particularly nice function to deal with, 
so we write it as a linear combination of characters, 
which are mueh nicer functions because they are mul- 
tiplicative. The coefficient associated with the character 
X in this linear combination is the number x(a)/<Mq). 

From the formula, it follows that 


X l0 § P 

P"=»( modq) 


ø(q) 


X x(a) X x(P m )logp. 

I (modq) V prime, m>l 


The sum on the left-hand side is a natural adaptation of 
the sum we considered earlier when we were counting 
all primes. And we can estimate it if we can get good 
estimates for each of the sums 

X x(p m ) logp. 

p prime, m> 1 


We approach these sums mueh as we did before, obtain- 
ing an explicit formula, analogous to (7), (10), now in 
terms of the zeros of the Dirichlet L- function: 

«*.*>- 1 


This function turns out to have properties closely 
analogous to the main properties of £(s) . In particular, 
it is here that the multiplicativity of x is all-important, 
since it gives us a formula similar to (1): 

That is, 1(5, x) has an Euler product. We also believe 
the “generalized Riemann hypothesis” that all zeros p 
of L(p,x) = 0 in the critical strip satisfy Re(p) = \. 
This would imply that the number of primes up to x 
that are congruent to a mod q can be estimated as 

rr(x;q,a) = + 0(Vxlog 2 (qx)). (13) 
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Therefore, the generalized Riemann hypothesis implies 
the estimate we were hoping for (formula (11)), pro- 
vided that x is a little bigger than q 2 . 

In what range can we prove (11) unconditionally — 
that is, without the help of the generalized Riemann 
hypothesis? Although we can more or less translate the 
proof of the prime number theorem over into this new 
setting, we find that it gives (11) only when x is very 
large. In faet, x has to be bigger than an exponential 
in a power of q, which is a lot bigger than the “x is a 
little larger than q 2 " that we obtained from the general- 
ized Riemann hypothesis. We see a new type of problem 
emerging here, in which we are asking for a good start- 
ing point for the range of x for which we obtain good 
estimates, as a funetion of the modulus q\ this does not 
have an analogy in our exploration of the prime num- 
ber theorem. By the way, even though this bound “x is a 
little larger than q 2 ” is far out of reach of current meth- 
ods, it still does not seem to be the hest answer; calcu- 
lations reveal that (11) seems to hold when x is just 
a little bigger than q. So even the Riemann hypothesis 
and its generalizations are not powerful enough to tell 
us the precise behavior of the distribution of primes. 

Throughout the twentieth century mueh thought was 
put in to bounding the number of zeros of Dirichlet L- 
funetions near to the 1-line. It turns out that one can 
make enormous improvements in the range of x for 
which (11) holds (to “halfway between polynomial in 
q and exponential in q") provided there are no Siegel 
zeros. These putative zeros /? of 1(5, ( q )) wouldbe real 
numbers with /? > 1 - c/ ,jq; they can be shown to be 
extremely rare if they exist at all. 

That Siegel zeros are rare is a consequence of 
the Deuring-Heilbronn phenomenon: that zeros of L- 
functions [III.49] repel each other, rather like simi- 
larly charged particles. (This phenomenon is akin to 
the faet that different algebraic numbers repel one 
another, part of the basis of the subject of Diophantine 
approximation.) 

How big is the smallest prime congruent to a mod q 
when (a,q) = 1? Despite the possibility of the existence 
of Siegel zeros, one can prove that there is always such 
a prime less than q 5 S if q is sufficiently large. Obtain- 
ing a result of this type is not difficult when there are 
no Siegel zeros. If there are Siegel zeros, then we go 
back to the explicit formula, which is similar to (7) but 
now concerns zeros of 1(5, x). If P is a Siegel zero, then 
it turns out that in the explicit formula there are now 
two obviously large terms: x/4>(q ) and - ( q )x^//?</>(q). 
When ( | ) = 1 it appears that they might almost cancel 


(since (8 is close to 1), but with more care we obtain 

X ~qf = (X ~ XP)+XØ ( 1 ~^) ~ X(1 ~ ^ )l0gX ' 

This is a smaller main term than before, but it is not 
too hard to show that it is bigger than the contribu- 
tions of all of the other zeros combined, because the 
Deuring-Heilbronn phenomenon implies that the Siegel 
zero repels those zeros, forcing them to be far to the 
left. When (|) = -1, the same two terms tell us that 
if (1 - /?) log x is small, then there are twice as many 
primes as we would expect up to x that are congruent 
to a mod q. 

There is a close connection between Siegel zeros 
and class numbers, which are defined and discussed in 
algebraic numbers [IV. 1 §7]. Dirichlet’s class number 
formula States that 1(1, (^)) = nh- q /^/q for q > 6, 
where h- q is the class number of the held Q(^/-q). A 
class number is always a positive integer, so this result 
immediately implies that L( 1 , ( q )') ^ tt / ,J q. Another 
consequence is that h- q is small if and only if L ( 1 , ( ^ ) ) 
is small. The reason this gives us information about 
Siegel zeros is that one can show that the derivative 
I/(cr, ( - ) ) is positive (and not too small) for real num- 
bers er close to 1. This implies that L( 1, (i)) is small 
if and only if 1(5, ( q )) has a real zero close to 1, that 
is, a Siegel zero f. When h- q = 1, the link is more 
direct: it can be shown that the Siegel zero /•> is approx- 
imately 1 - 6/(n^/q). (There are also more complicated 
formulas for larger values of h- q .) 

These connections show that getting good lower 
bounds on h- q is equivalent to getting good bounds 
on the possible range for Siegel zeros. Siegel showed 
that for any e > 0 there exists a constant c f > 0 
such that 1(1, (^)) > c e q~ e . His proof was unsatis- 
factory because by its very nature one cannot give an 
explicit value for c f . Why not? Well, the proof comes 
in two parts. The first assumes the generalized Rie- 
mann hypothesis, in which case an explicit bound fol- 
lows easily. The second obtains a lower bound in terms 
ofthe first counterexample to the generalized Riemann 
hypothesis. So if the generalized Riemann hypothesis is 
true but remains unproved, then Siegel’s proof cannot 
be exploited to give explicit bounds. This dichotomy, 
between what can be proved with an explicit constant 
and what cannot be, is seen far and wide in analytic 
number theory— and when it appears it usually stems 
from an application of Siegel’s result, and especially its 
consequences for the range in which the estimate (11) 
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A polynomial with integer coefficients cannot always 
take on prime values when we substitute in an inte- 
ger. To see this, note that if p divides f(m) then p 

also divides /(m + p),f(m + 2 p), However, there 

are some prime-rich polynomials, a famous example 
being the polynomial x 2 + x + 41, which is prime for 
x = 0, 1, 2, . . . , 39. There are almost certainly quadratic 
polynomials that take on more consecutive prime val- 
ues, though their coefficients would have to be very 
large. If we ask the more restricted question of when the 
polynomial x 2 +x+ pis prime for* = 0, 1, 2 , . . . , p- 2 , 
then the answer, given by Rabinowitch, is rather sur- 
prising: it happens if and only if h- q = 1, where q = 
4p - 1. Gauss did extensive calculations of class num- 
bers and predicted that there are just nine values of q 
with h- a = 1, the largest of which is 163 = 4 x 41 - 1. 
Using the Deuring-Heilbronn phenomenon researchers 
showed, in the 1930s, that there is at most one q with 
h-q = 1 that is not already on Gauss’s list; but as usual 
with such methods, one could not give a bound on the 
size of the putative extra counterexample. It was not 
until the 1960s that Baker and Stark proved that there 
was no tenth q, both proofs involving techniques far 
removed from those here (in faet Heegner gave what 
we now understand to have been a correct proof in the 
1 9 5 Os but he was so far ahead of his time that it was dif- 
ficult for mathematicians to appreciate his arguments 
and to believe that all of the details were correct). In the 
1980s Goldfeld, Gross, and Zagier gave the hest result 
to date, showing that h- q > 77 ' 00 log q this time using 
the Deuring-Heilbronn phenomenon with the zeros of 
yet another type of L-function to repel the zeros of 

1(5, (-)). 

This idea that primes are well-distributed in arith- 
metic progressions except for a few rare moduli was 
exploited by Bombieri and Vinogradov to prove that 
(11) holds “almost always” when x is a little bigger than 
q 2 (that is, in the same range that we get “always” from 
the generalized Riemann hypothesis). More precisely, 
for given large x we have that (11) holds for “almost 
all” q less than ^/x/(logx) 2 and for all a such that 
(a,q) = 1. “ Almost all ” means that, out of all q less 
than y'x/ (logx) 2 , the proportion for which (11) does 
not hold for every a with (a, q) = 1 tends to 0 as x — oo. 
Thus, the possibility is not ruled out that there are 
infinitely many counterexamples. However, since this 
would contradict the generalized Riemann hypothesis, 
we do not believe that it is so. 

The Barban-Davenport-Halberstam theorem gives a 
weaker result, but it is valid for the whole feasible 


range: for any given large x, the estimate (11) holds 
for “almost all” pair s q and a such that q ^ x/ (logx) 2 
and (a, q) = 1. 

5 Primes in Short Intervals 

Gauss’s prediction referred to the primes “around” x, 
so it perhaps makes more sense to interpret his state- 
ment by considering the number of primes in short 
intervals at around x. If we believe Gauss, then we 
might expect the number of primes between x and 
x + y to be about y / logx. That is, in terms of the 
prime-counting funetion tt, we might expect that 

tt ( x + y) TT (x ) ~ . -%• (14) 

logx 

for | y | < x/2. However, we have to be a little careful 
about the range for y. For example, if y = \ logx, then 
we certainly cannot expect to have half a prime in each 
interval. Obviously we need y to be large enough that 
the prediction can be interpreted in a way that makes 
sense; indeed, the Gauss-Cramér model suggests that 
(14) should hold when | y | is a little bigger than (log x ) 2 . 

If we attempt to prove (14) using the same methods 
we used in the proof of the prime number theorem, 
we find ourselves bounding differences between pth 
powers as follows: 

^ t R e(P)-l dt 
C^(x-t-;y) Re(p) - 1 . 

With bounds on the density of zeros of £(5) well to 
the right of \, it has been shown that (14) holds for y 
a little bigger than x 7/12 ; but there is little hope, even 
assuming the Riemann hypothesis, that such methods 
will lead to a proof of (14) for intervals of length y'x or 
less. 

In 1949 Selberg showed that (14) is true for “almost 
all” x when \y \ is a little bigger than (logx) 2 , assum- 
ing the Riemann hypothesis. Once again, “almost all” 
means with density tending to 1, rather than “all,” and 
it is feasible that there are infinitely many counter- 
examples, though at that time it seemed highly unlikely. 
It therefore came as a surprise when Maier showed, in 
1984, that, for any fixed A > 0, the estimate (14) fails 
for infinitely many integers x, with y = (logx) -4 . His 
ingenious proof rests on showing that the small primes 
do not always have as many multiples in an interval as 
one might expect. 
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Table 2 The largest known gaps between primes. 


Pn 

Pn+l - Pn 

Pn+l - Pn 

log 2 p„ 

113 

14 

0.6264 

1327 

34 

0.6576 

31397 

72 

0.6715 

370 261 

112 

0.6812 

2 010 733 

148 

0.7026 

20831323 

210 

0.7395 

25056082087 

456 

0.7953 

2 614 941710 599 

652 

0.7975 

19 581334192423 

766 

0.8178 

218209405436543 

906 

0.8311 

1693182 318 746 371 

1132 

0.9206 


Let p\ = 2 < p2 = 3 < ■ ■ ■ be the sequence 
of primes. We are now interested in the size of the 
gaps p n + 1 - p n between consecutive printes. Since 
there are about x/logx primes up to x, the average 
difference is logx and we might ask how often the 
difference between consecutive primes is about aver- 
age, whether the differences can get really small, and 
whether the differences can get really large. The Gauss- 
Cramér model suggests that the proportion of n for 
which the gap between consecutive primes is more than 
A times the average, that is p n + 1 - p n > Alog p n , is 
approximately e -A ; and, similarly, the proportion of 
intervals [x,x + Alogx] containing exactly k primes 
is approximately e — A A fc /fe!, a suggestion which, as we 
shall see, is supported by other considerations. By 
looking at the tail of this distribution, Cramér conjec- 
tured that lim sup n _ ra (p n+ i - p n )/(log p„) 2 = 1, and 
the evidence we have seems to support this (see table 2). 

The Gauss-Cramér model does have a big drawback: 
it does not “know any arithmetic.” In particular, as we 
noted earlier, it does not predict divisibility by small 
primes. One manifestation of this fading is that it pre- 
dicts that there should be just about as many gaps 
of length 1 between primes as there are of length 2. 
However, there is only one gap of length 1, since if 
two primes differ by 1, then one of them must be 
even, whereas there are many examples of pairs of 
primes differing by 2, and there are believed to be 
infmitely many. For the model to make correct conjec- 
tures about prime pairs, we must consider divisibility 
by smad primes in the formulation of the model, which 
makes it rather more complicated. Since there are these 
glaring errors in the simpler model, Cramér’s conjec- 
ture for the largest gaps between consecutive primes 


must be treated with a degree of suspicion. And in 
faet, if one corrects the model to account for divis- 
ibility by small primes, one is led to conjecture that 
lim sup n _„ (p n+ i - p n )/ (log p n ) 2 is greater than |. 

Finding large gaps between primes is equivalent to 
finding long sequences of composite numbers. How 
about trying to do this explicitly? For example, we know 
that n! + j is composite for 2 ^ j < n, as it is divisi- 
ble by j. Therefore we have a gap of length at least 
n between consecutive primes, the first of which is 
the largest prime less than or equal to n! + 1. How- 
ever, this observation is not especially helpful, since 
the average gap between primes around n! is log(nl), 
which is approximately equal to n logn, whereas we 
are looking for gaps that are larger than the average. 
However, it is possible to generalize this argument and 
show that there are indeed long sequences of consec- 
utive integers, each with a small prime factor. In the 
1930s, Erdos reformulated the question as follows. Fix 
a positive integer z, and for each prime p ^ z choose 
an integer a p in such a way that, for as large an integer 
y as possible, every positive integer n^y satisfies at 
least one of the congruences n = a p (mod p). Now let 
X be the product of all the primes up to z (which means, 
by the prime number theorem, that log X is about z), 
and let x be the integer between X and 2X such that 
x = -a p (mod p) for every p < z. (This integer exists, 
by the Chinese remainder theorem.) If m is an integer 
between x + 1 and x + y, then m - x is a positive 
integer less than y, so m - x = a p (mod p) for some 
prime p ^ z. Since x = -a p (mod p), it follows that 
m is divisible by p. Thus, all the integers from x + 1 
tox + y are composite. Using this basic idea, it can 
be shown that there are infmitely many primes p n for 
which p n+ 1 - p n is about (logp n )(loglogp n ), which is 
significantly larger than the average but nowhere close 
to Cramér's conjecture. 

6 Gaps between Primes that Are 
Smaller than the Average 

We have just seen how to show that there are in- 
fimtely many pairs of consecutive primes whose dif- 
ference is mueh bigger than the average: that is, 
limsup„ -00 (p n +i - Pn)/ (log p n ) = o«. We would now 
like to show that there are infmitely many pairs of con- 
secutive primes whose difference is mueh smaller than 
the average: that is, liminf n -«,(pn+i - pn)/(logpn) = 
0. Of course, it is believed that there are infmitely many 
pairs of primes that differ by 2, but this question seems 
intractable for now. 
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Until recently researchers had very little success with 
the question of small gaps; the best result before 2000 
was that there are biflnitely many gaps of size less 
than one-quarter of the average. However, a recent 
method of Goldston, Pintz, and Yildirim, which counts 
primes in short intervals with simple weighting func- 
tions, proves that lim inf n ~°o (pn+i - Pn) / (log p«) = 0, 
and even that there are infinitely many pairs of con- 
secutive primes with difference no larger than about 
Vlog Pn- Their proof, rather surprisingly, rests on esti- 
mates for primes in arithmetic progressions; in par- 
ticular, that (11) holds for almost all q up to sfx (as 
discussed earlier). Moreover, they obtain a conditional 
result of the following kind: if in faet (11) holds for 
almost all q up to a little larger than -jx, then it follows 
that there exists an integer B such that p n+ i - p n ^ B 
for infinitely many primes p n . 

7 Very Small Gaps between Primes 

There appear to be many pairs of primes that differ by 
two, like 3 and 5, 5 and 7, . . . , the so-called twin primes, 
though no one has yet proved that there are infinitely 
many. In faet, for every even integer 2 k there seem to 
be many pairs of primes that differ by 2 k, but again no 
one has yet proved that there are infinitely many. This 
is one of the outstanding problems in the subject. 

In a similar vein is Goldbach’s conjecture from the 
1760s: is it true that every even integer greater than 2 
is the sum of two primes? This is still an open ques- 
tion, and indeed a publisher recently offered a million 
dollars for its solution. We know it is true for almost 
all integers, and it has been computer tested for every 
even integer up to 4 x 10 14 . The most famous result on 
this question is due to Chen (1966), who showed that 
every even integer can be written as the sum of a prime 
and a second integer that has at most two prime factors 
(that is, it could be a prime or an “almost-prime”). 

In faet, goldbach [VI. 17] never asked this question. 
He asked Euler, in a letter in the 1760s, whether every 
integer greater than 1 can be written as the sum of at 
most three primes, which would imply what we now call 
the “Goldbach conjecture.” In the 1920s Vinogradov 
showed that every sufficiently large odd integer can be 
written as the sum of three primes (and thus every suf- 
ficiently large even integer can be written as the sum of 
four primes). We actually believe that every odd inte- 
ger greater than 5 is the sum of three primes but the 
known proof s only work once the numbers involved are 
large enough. In this case we can be explicit about “suf- 
ficiently large”— at the moment the proof needs them 


to be at least e 5700 , but it is rumored that this may soon 
be substantially reduced, perhaps even to 7. 

To guess at the precise number of prime pairs 
q, q + 2 with q ^ x we proceed as follows. If we do 
not consider divisibility by the small primes, then the 
Gauss-Cramér model suggests that a random integer 
up to x is prime with probability roughly 1 / log x, so 
we might expect x/(logx) 2 prime pairs q, q + 2 up 
to x. However, we do have to account for the small 
primes, as the q, q + 1 example shows, so let us con- 
sider 2-divisibility. The proportion of random pairs of 
integers that are both odd is whereas the proportion 
of random q such that q and q + 2 are both odd is \. 
Thus we should adjust our guess x / (log x ) 2 by a factor 
( \ | ) = 2. Similarly, the proportion of random pairs 
of integers that are both not divisible by 3 (or indeed by 
any given odd prime p) is ( 2 ) 2 (and (1-1/p) 2 , respec- 
tively), whereas the proportion of random q such that 
q and q + 2 are both not divisible by 3 (or by prime p) is 
5 (and (1-2/p), respectively). Adjusting our formula 
for each prime p we end up with the prediction 
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This is known as the asymptotic twin-prime conjecture. 
Despite its plausibility there do not seem to be any 
practical ideas around for turning the heuristic argu- 
ment above into something rigorous. The one good 
unconditional result known is that the number of twin 
primes less than or equal to x is never more than four 
times the quantity we have just predicted. One can 
make a more precise prediction replacing x/(logx) 2 
by J 2 *(l/(logt) 2 )dt, and then we expect that the dif- 
ference between the two sides is no more than c-Jx 
for some constant c > 0, a guesstimate that is well 
supported by computational evidence. 

A similar method allows us to make predictions for 
the number of primes in any polynomial-type patterns. 
Let j\ (t),/ 2 (t), . . e z [t] be distinet irreducible 

polynomials of degree greater than or equal to 1 with 
positive leading coefficient, and define co(p) to be the 
number of integers n (mod p) for which p divides 
fi (n)f 2 (n) ■ ■ ■ fkin). (In the case of twin primes above 
we have /j(i) = t, f 2 (t) = t + 2 with co(2) = 1 and 
c o(p) = 2 for all odd primes p.) If c «(p) = p then p 
always divides at least one of the polynomial values, 
so they can be simultaneously prime just finitely often 
(an example of this is when /j (t) = t, f 2 (t) = t + 1, in 



30 


IV. Branches of Mathematics 


which case tu (2) = 2). Otherwise we have an admissi- 
ble set of polynomials for which we predict that the 
number of integers n less than x for which all of 
fi (n) , / 2 (n) , . . . , fk (n ) are prime is about 

pr ( 1 — to f{p) / p) 
ppine (1 - l/P-) fc 

x log|/i(x)|log|/ 2 (x)| ■ ■ ■ log l/k (x) | (15) 

once x is sufficiently large. One can use a similar heuris- 
tic to make predictions in Goldbach’s conjecture, that 
is, for the number of pairs of primes p, q for which 
p + q = 2 N. Again, these predictions are very well 
matched by the computational evidence. 

There are just a few cases of conjecture (15) that have 
been proved. Modifications of the proof of the prime 
number theorem give such a result for admissible poly- 
nomials qt + a (in other words, for primes in arithmetic 
progressions) and for admissible at 2 + btu + cu 2 e 
z[t,u] (as well as some other polynomials in two vari- 
ables of degree two). It is also known for a certain type 
of polynomial in n variables of degree n (the admissible 
“norm-forms”). 

There was little improvement on this situation dur- 
ing the twentieth century until quite recently, when, 
by very different methods, Friedlander and Iwaniec 
broke through this stalemate showing such a result 
for the polynomial t 2 +u 4 , and then Heath-Brown did 
so for any admissible homogeneous polynomial in two 
variables of degree three. 

Another truly extraordinary breakthrough occurred 
recently with a result of Green and Tao, proved in 2004, 
which States that for every k there are infimtely many 
fe-term arithmetic progressions of primes: that is, pairs 
of integers a , d such that a, a+d, a+2d,..., a+ (fe- l)d 
are all prime. Green and Tao are currently hard at work 
attempting to show that the number of fe-term arith- 
metic progressions of primes is indeed well approxi- 
mated by (15). They are also extending their results to 
other families of polynomials. 

8 Gaps between Primes Revisited 

In the 1970s Gallagher deduced from the conjectured 
prediction (15) (with fj(t) = t + a.j) that the propor- 
tion of intervals [x, x + A log x] which contain exactly fe 
primes is close to e _A A k /fe! (as was also deduced, in sec- 
tion 5 above, from the Gauss-Cramér heuristics). This 
has recently been extended to support the prediction 
that, as we vary x from X to 2X, the number of primes 
in the interval [x,x + y\ is normally distributed with 


mean J* + - > '(l/logt)dt and variance (1 - 5)y/ logx, 
where S is some constant strictly between 0 and 1 and 
we take y to be x 5 . 

When y J *, fx the Riemann zeta function supplies 
information on the distribution of primes in intervals 
[x, x + y) via the explicit formula (7). Indeed, when we 
compute the “variance” 



using the explicit formula we obtain a sum of terms 
of the form [£ X x Uy J~ n) dx. Here we are assuming the 
Riemann hypothesis and writing the zeros of £(5) as 
g ± iy n with 0 < yi < y 2 < ■ ■ ■ . This sum is domi- 
nated by the terms corresponding to those pairs y,-, yu 
for which | y ; - - yk\ is small (in which case there is lit- 
tle cancellation in the integral). Therefore, in order to 
understand the variance for the distribution of primes 
in short intervals we need to understand the distribu- 
tion of the zeros of £(s) in short intervals. In 1973 
Montgomery investigated this and suggested that the 
proportion of pairs of zeros of £(5) whose difference is 
less than a times the average gap between consecutive 
zeros is given by the integral 

and he proved an equivalent form of this in a lim- 
ited range. If the zeros were placed “randomly,” then 
(16) would be replaced by a. In faet (16) is about g a 3 
for small a, which is far smaller than «. This means 
that there are far fewer pairs of zeros of £(s) that are 
close together than one might expect, which we express 
informally by saying that the zeros of £(s) repel one 
another. 

In a now-famous conversation that took place at the 
Institute for Advanced Study in Princeton, Montgomery 
mentioned his ideas to the physicist Freeman Dyson. 
Dyson immediately recognized (16) as a function that 
comes up in modeling energy levels in quantum chaos. 
Believing that this was unlikely to be a coincidence, he 
suggested that the zeros of the Riemann zeta function 
are distributed, in all aspects, like energy levels, which 
are in turn modeled on the distribution of eigenvalues 
[1.3 §4.3] of random hermitian matrices [III.52 §3]. 
There is now substantial computational and theoret- 
ical evidence that Dyson’s suggestion is correct and can 
be extended to Dirichlet L-functions, as well as other 
types of L-functions, and even to other statistics about 
L-functions. 
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One note of caution. Few of the conjectured conse- 
quences of this new “random matrix theory” have been 
unconditionally proved, or seem likely to be in the fore- 
seeable future. It simply provides a tool to make pre- 
dictions where that was too difficult to do before. How- 
ever, there is at least one key question about which we 
still cannot make a well-substantiated prediction: how 
big does £(s) get on the j-line? One can show that 
log + it) | gets larger than Vlog T for values of t 
close to T, and that it gets no larger than log T. How- 
ever, it is unclear, even if we do not insist on a rigorous 
proof, whether the true maximal order is nearer the 
upper or lower bound. 

9 Sieve Methods 

Almost all of our discussion so far has been about 
developments of Riemann’s approach to counting 
primes. This approach is very delicate and not as adapt- 
able as one might wish to many natural questions (such 
as counting fc-tuples of primes n+ai , n+a-z, ■ ■ ■ , n+cik). 
However, one can go back to sieve methods, which are 
modifications of the sieve of Eratosthenes, and at least 
get upper bounds. For example, suppose we want to 
find an upper bound for the number of prime pairs 
n, n + 2 with JV < n < 2 N. One possibility would be 
to fix a number y and determine for how many pairs 
n, n + 2 with N < n ^ 2N it is the case that neither n 
nor n + 2 has a prime factor less than y. If we took y to 
be (2iV) 1/2 , then this method would exactly count the 
twin primes, but it seems to be far too difficult to imple- 
ment. But it turns out that if instead we take y to be a 
small power of N, then the calculations become much 
easier and there are ways of obtaining good bounds. 
(However, these bounds become less accurate as the 
power gets doser to §.) 

In the 1920s Brun showed how to make the principle 
of inclusion-exclusion into a useful tool in this type of 
question. This principle is hest exhibited when count- 
ing the number of integers n in a set S that are coprime 
to given integer m. We begin with the number of inte- 
gers in S, which is obviously more than the quantity we 
seek. Next, we subtract, for each prime p dividing m, 
the number of integers in S that are divisible by p. If 
n e S is divisible by exactly r prime factors of m, then 
we have counted 1 + r x (-1) for the contribution of n 
so far, which is less than or equal to 0, and less than 0 
for r > 2; whereas we wanted to count 0 when r ^ 2 
(since n is not coprime to m). Thus we obtain a number 
that is less than the quantity we seek. To compensate 


for that, we add back in the number of integers in S 
divisible by pq for each pair of primes p < q which 
divide m. We have now counted 1 + r x (-1) + (£) x 1 
for the contribution of n, which is greater than or equal 
to 0, and greater than 0 for r ^ 3. Similarly, we subtract 
the number of integers divisible by pqr, etc. 

For each n e S we end up counting (1 - 1 ) r for 
n, where r is the number of distinet prime factors 
of ( m,n ). Expanding this sum with the binomial the- 
orem we may reexpress this identity as follows. Let 
Xm(tt) = 1 if (n, m) = 1 and 0 otherwise. Then 

Xm(n) = X b(d), 
d\(m,n) 

where p(m), the Mobius funetion, equals 0 if m is 
divisible by the square of a prime and equals (— 1) C0( ” T) 
otherwise, where w (m) is the number of distinet prime 
factors of m. 

The inclusion-exclusion inequalities just discussed 
may be obtained from 


X P(d)^Xm(n)^ 


X V(d), 

d\(m,n ) 
coW)<2fc 


which holds for any k ^ 0, by summing over all n e S. 

The reason for using these abbreviated sums rather 
than the complete sum is that there are far fewer terms 
and thus, when one sums over values of n, there will be 
far fewer rounding errors (remember that it was round- 
ing errors that sank our attempt to estimate the number 
of primes up to x using the sieve of Eratosthenes). On 
the other hånd, they have the disadvantage that they 
cannot possibly give the exact answer, since they are 
missing many appropriate terms. However, with a judi- 
cious choice of k the missing terms do not contribute 
much to the complete sum and we get a good answer. 

Minor variants work well for many questions. In the 
“combinatorial sieve” one selects which d are part of 
the upper and lower bound sums, not by counting the 
total number of prime factors they contain but instead 
using other criteria, such as the numbers of prime fac- 
tors of d in each of several intervals. Using such a 
method, Brun showed that there cannot be too many 
twin primes p, p + 2; indeed, the sum of l/p, over all 
primes p for which p + 2 is also prime, converges, in 
contrast with (3). 

In the “Selberg upper bound sieve” one comes up 
with some numbers Ad that are nonzero only when 
d < D (where D is chosen to be not too large), with 
the property that 

Xm(n)^(X A dJ foralin. 
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Summing over the appropriate n one then finds the 
optimal solution by mimmizing the resulting quad- 
ratic form. Lower bounds can also be obtained out 
of Selberg’s methods. It was by using such methods 
that Chen was able to prove there are infmitely many 
primes p for which p + 2 has at most two prime fac- 
tors, and that Goldston, Pintz, and Yildirim were able to 
estabbsh that there are sometimes short gaps between 
primes. These methods are also an essential ingredient 
in the work of Green and Tao. One can also get good 
upper bounds on the number of primes in arithmetic 
progressions and short intervals: 

• there are never more than 2 y / log y primes in any 
interval of length y; 

• there are never more than 2x/4>(q)log(x/q) 
primes up to x in an arithmetic progression 
mod q. 

Notice that in each case the log in the denominator 
is of the number of integers being considered (y and 
x/q, respectively), not logx as expected, though this 
will only make a significant difference if the number 
of integers being considered is small. Otherwise these 
inequalities are bigger than the expected quantity by a 
factor of 2. Can this “2” be improved? It will be difficult 
because we showed earlier that if there are Siegel zeros 
then we get twice as many primes as expected in certain 
arithmetic progressions. Therefore, if we can improve 
the “2” in these two formulas, then we can deduce that 
there are no Siegel zeros! 

10 Smooth Number s 

An integer is y-smooth if all of its prime factors are 
less than or equal to y. A proportion 1 - log 2 of the 
integers up to x are yx-smooth, and indeed, for any 
fixed li > 1 there exists some number p(u) > 0 such 
that if x = y u , then a proportion p(u) of the integers 
up to x are y-smooth. This proportion does not seem 
to have any easy definition in general. For 1 ^ u < 2 we 
have p ( u ) = 1 - log u, but for larger u it is hest defined 

1 f 1 

p(u) = - p(u - t) dt, 
u Jo 

an integral delay equation. Such an equation is typical 
when we give precise estimates for questions that arise 
in sieve theory. 

Questions about the distribution of smooth numbers 
arise frequently in the analysis of algorithms, and have 
consequently been the focus of a lot of recent research. 


(See computational number theory [IV.3 §3] for an 
example of the use of smooth numbers.) 


11 The Circle Method 


Another method of analysis that plays a prominent role 
in this subject is the so-called circle method, which goes 
back to hardy [VI. 73] and littlewood [VI. 79]. This 
method uses the faet that, for any integer n, 



For example, if we wish to count the number, r(n), of 
solutions to the equation p + q = n with p and q prime, 
we can express it as an integral as follows: 


= I fe 21 " 

P^kn Jo 


= e~ 2innt ( y e 217 ^ dt. 

' p prime^ p<n ' 

The first equality holds because the integrand is 0 when 
p + q n and 1 otherwise, and the second is easy to 
check. 

At first sight it looks more difficult to estimate the 
integral than it is to estimate r(n) direetly, but this 
is not the case. For instance, the prime number theo- 
rem for arithmetic progressions allows us to estimate 
P(t) = Z p< n e 2i7Tpt when t is a rational f/m with m 
small. For in this case, 


= s 


p 2inal/m 


S 


1 


If t is sufficiently close to f/m, then P(t) » P(f/m)\ 
such values of t are called the major ares and we believe 
that the integral over the major ares gives, in total, a 
very good approximation to r (n); indeed, we get some- 
thing very close to the quantity one predicts from some- 
thing like (15). Thus to prove the Goldbach conjecture 
we need to show that the contribution to the integral 
from the other values of t (that is, from the minor ares) 
is small. In many problems one can successfully do this, 
but no one has yet succeeded in doing so for the Gold- 
bach problem. Also useful is the “discrete analogue” of 
the above: using the identity 

i m y e nnjnim dl I 1 if n = 0 (mod m), 
tn . 0 lo otherwise 
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(which holds for any given integer m ^ 1), we have that 

r(n)= X il’ e 2iTrj<p+ ^ n)/m 

b PA<n j=0 

= £ e- 2i7T j n/m P(j lm) 2 

provided m > n. A similar analysis can be used here 
but working mod m sometimes has advantages, as it 
allows us to use properties of the multiplicative group 
mod m. 

Sums like P(j /m) in the paragraph above or more 
simple sums like Z n&N e2iTTnk,m are called exponential 
sums. They play a central role in many of the calcu- 
lations one does in analytic number theory. There are 
several techniques for investigating them. 

(1) It is easy to sum the geometric progression 
Z n ^e 2llrn,m . With higher-degree polynomials one can 
often reduce to this case; for example, by writing ni - 
ri2 = h we have 

I £ eW/ml 2 

1 n^N 1 

= I e 2in(ni-n%)/m 
m,n 2 ^N 

_ ^ e 2inh 2 /m ^ g4i nhn 2 lm 

\h\^N max{0,-«<n 2 

<mln {N,N-h} 

and the inner sum is now a geometric progression. 

(2) The work of Weil and Deligne, which gives very accu- 
rate results on the number of solutions to equations 
mod p, is ideally suited to many applications in ana- 
lytic number theory. For example, the “Kloosterman 
sum” z ai« 2 ■ ■ ■«*=* (mod p) e 2iTT,a ’ +a2 +■■■+<*) ir, where the 
di run over the integers mod p and (b,q) = 1, appears 
naturally in many questions; Deligne showed that it 
has absolute value less than or equal to kp (fc_1 ^ 2 , an 
extraordinary amount of cancellation in this sum which 
has about p k ~ l summands, each of absolute value 1. 
(See THE weil conjectures [V.38].) 

(3) We discussed earlier the faet that the values of £(s) 
satisfy a symmetry about the line Re(s) = \ > given by 
the “funetional equation.” There are other funetions 
(called “modular funetions”) that also have symme- 
tries in the complex plane; typically the value of the 
funetion at s is related to the value of the funetion at 
(as + fi)/(ys + 5), for some integers a, /?, y, 5 satisfy- 
ing ck6- Py = 1. Sometimes an exponential sum can be 


related to the value of a modular funetion, and subse- 
quently to the value of that modular funetion at another 
point, using the symmetry of the funetion. 

12 More I-Functions 

There are many types of L-functions beyond Dirich- 
let L-functions, some of which are well understood, 
some not. The type that has received the most atten- 
tion recently is a class of L-functions that can be asso- 
ciated with elliptic curves (see arithmetic geometry 
[IV. 5 §5.1]). An elliptic curve E is given by an equation 
of the form y 2 = x 3 + ax + b, where the discrimi- 
nant 4 a 3 + 2 7b 2 is nonzero. The associated L-function 
L(E,s) is most easily described in terms of its Euler 
product: 

Here a p is an integer which, for primes p not dividing 
4a 3 +27fc 2 , is defined to be p minus the number of solu- 
tions (x,y) (mod p ) to the equation y 2 = x 3 + ax + b 
(mod p). It can be shown that each \a p \ is less than 
2,Jp, so the Euler product above converges absolutely 
when Re(s) > |. Therefore, (17) is a good definition for 
these values of 5. Can we now extend it to the whole of 
the complex plane, as we did for £(s)? This is a very 
deep problem— the answer is yes; in faet, it is the cele- 
brated theorem of Andrew Wiles that implied fermat’s 

LAST THEOREM [V.12]. 

Another interesting question is to understand the 
distribution of values of a p l2jp as we range over 
primes p. These all lie in the interval [-1,1]. One might 
expect them to be uniformly distributed in the inter- 
val, but in faet this is never the case. As discussed in 
algebraic numbers [IV.l] one can write a p = a p + å p , 
where | a p \ = ,Jp , and a p is called the Weil number. If 
we write a = v / pe-' 0 '’, then a p = 2,/p cos(6 p ) for some 
angle 9 P g [0, tt]. We can then think of 6 P as belong- 
ing to the upper half of a circle. The surprise is that 
for almost all elliptic curves the 9 P are not uniformly 
distributed, which would mean the proportion in a cer- 
tain arc would be proportional to the length of that arc. 
Rather, they are distributed in such a way that the pro- 
portion of them in any given arc is proportional to the 
area under that arc. This is a recent result of Richard 
Taylor. 

The correct analogue of the Riemann hypothesis for 
L(E,s) turns out to be that all the nontrivial zeros lie 
on the line Re(5) = 1. This is believed to be true. More- 
over, it is believed that they, like the zeros of £(s), 
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are distributed according to the rules that govern the 
eigenvalues of randomly chosen matrices. 

These L-functions often have zeros at 5 = 1 (which 
is linked to the birch-swinnerton-dyer conjec- 
ture [V.4]) and these zeros repel zeros of Dirichlet L- 
fimctions (which is what was used by Goldfeld, Gross, 
and Zagier, as mentioned in section 4, to get their lower 
bound on h-q). 

L-functions arise in many areas of arithmetic geom- 
etry, and their coefficients typically describe the num- 
ber of points satisfying certain equations mod p. The 
Langlands program seeks to understand these connec- 
tions at a deep level. 

It seems that every “natural” L-function has many 
of the same analytic properties as those discussed 
in this article. Selberg has proposed that this phe- 
nomenon should be even more general. Consider sums 
Ms) = £ n >ia n ln s that 

• are well-defmed when Re(5) > 1, 

• have an Euler product ]!p(l + b p /p s + b p 2/p 2s + 
■ ■ ■ ) in this (or an even smaller) region, 

• have coefficients a n that are smaller than any 
given power of n, once n is sufficiently large, 

• satisfy \b n \ < Kn e for some constants 0 < l and 

K > 0. 

Selberg conjectures that we should be able to give a 
good definition to A(s) on the whole complex plane, 
and that A(s) should have a symmetry connecting the 
value of A(s) with A( 1 -s). Furthermore, he conjectures 
that the Riemann hypothesis should hold for A(s)! 

The current wishful thinking is that Selberg’ s family 
of L-functions is precisely the same as those considered 
by Langlands. 

13 Conclusion 

In this article we have described current thinking on 
several key questions about the distribution of primes. 
It is frustrating that after centuries of research so little 
has been proved, the primes guarding their mysteries 
so jealously. Each new breakthrough seems to require 
brilliant ideas and extraordinary technical prowess. As 
euler [VI. 19] wrote in 1770: 

Mathematicians have tried in vain to discover some 

order in the sequence of prime numbers but we have 

every reason to believe that there are some mysteries 

which the human mind will never penetrate. 
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IV. 3 Computational Number Theory 

Carl Pomerance 


1 Introduction 

Historically, computation has been a driving force in 
the development of mathematics. To help measure the 
sizes of their helds, the Egyptians invented geometry. 
To help predict the positions of the planets, the Greeks 
invented trigonometry. Algebra was invented to deal 
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with equations that arose when mathematics was used 
to model the world. The list goes on, and it is not just 
historical. If anything, computation is more important 
than ever. Much of modern technology rests on algo- 
rithms that compute quickly: examples range from the 
wavelets [VII. 3] that allow CAT scans, to the numerical 
extrapolation of extremely complex systems in order to 
predict weather and global warming, and to the com- 
binatorial algorithms that lie behind Internet search 
engmes (see the mathematics of algorithm design 
[VIL 5 §6]). 

In pure mathematics we also compute, and many 
of our great theorems and conjectures are, at root, 
motivated by computational experience. It is said that 
gauss [VI. 2 6], who was an excellent computationalist, 
needed only to work out a concrete example or two 
to discover, and then prove, the underlying theorem. 
While some branches of pure mathematics have per- 
haps lost contact with their computational origins, the 
advent of cheap computational power and convenient 
mathematical software has helped to reverse this trend. 

One mathematical area where the new emphasis on 
computation can be clearly felt is number theory, and 
that is the main topic of this article. A prescient call-to- 
arms was issued by Gauss as long ago as 1801: 

The problem of distinguishing prime numbers from 
composite numbers, and of resolving the latter into 
their prime factors, is known to be one of the most 
important and useful in arithmetic. It has engaged the 
industry and wisdom of ancient and modern geometers 
to such an extent that it would be superfluous to dis- 
cuss the problem at length. Nevertheless we must con- 
fess that all methods that have been proposed thus far 
are either restricted to very special cases or are so labo- 
rious and difficult that even for numbers that do not 
exceed the limits of tables constructed by estimable 
men, they try the patience of even the practiced cal- 
culator. And these methods do not apply at all to 
larger numbers. . . . Further, the dignity of the science 
itself seems to require that every possible means be 
explored for the solution of a problem so elegant and 
so celebrated. 

Factorization into primes is a very basic issue in 
number theory, but essentially all branches of num- 
ber theory have a computational component. And in 
some areas there is such a robust computational liter- 
ature that we discuss the algorithms involved as math- 
ematically interesting objects in their own right. In this 
article we will briefly present a few examples of the 
computational spirit: in analytic number theory (the 
distribution of primes and the Riemann hypothesis); 


in Diophantine equations (Fermat’s last theorem and 
the ABC conjecture); and in elementary number theory 
(primality and factorization). A secondary theme that 
we shall explore is the strong and constructive inter- 
play between computation, heuristic reasoning, and 
conjecture. 

2 Distinguishing Prime Numbers 
from Composite Numbers 

The problem is simple to State. Given an integer n > 1, 
decide if n is prime or composite. And we all know an 
algorithm. Divide n by each positive integer in turn. 
Either we find a proper factor, in which case we know 
that n is composite, or we do not, in which case we 
know that n is prime. For example, take n = 269. It is 
odd, so it has no even divisors. It is not a multiple of 3, 
so it has no divisor which is a multiple of 3. Continuing, 
we rule out 5, 7, 11, and 13.Thenextpossibility, 17, has 
a square that is greater than 269, which means that if 
269 were a multiple of 1 7, then it would also have to be 
a multiple of some number less than 17. Since we have 
ruled that out, we can stop our trial division at 13 and 
conclude that 269 is prime. (If we were actually carrying 
out the algorithm, we might try dividing 269 by 17, in 
which case we would discover that 269 = 15x 17+ 14. 
At that point we would notice that the quotient, 15, 
is less than 17, which is what would tell us that 17 2 
was greater than 269. Then we could stop.) In general, 
since a composite number n has a proper factor d with 
d < -Jn, one can give up on the trial dividing once one 
passes Vn, at which point we know that n is prime. 

This straightforward method is excellent for men- 
tal computation with small numbers, and for machine 
computation for somewhat larger numbers. But it 
scales poorly, in that if you double the number of digits 
of n, then the time for the worst case is squared; it is 
therefore an “exponential time” algorithm. One might 
tolerate such an algorithm for twenty-digit inputs, but 
think how long it would take to establish the primality 
of a forty-digit number! And you can forget about num- 
bers with hundreds or thousands of digits. The issue 
of how the running time of an algorithm scales when 
one goes to larger inputs is absolutely paramount in 
measuring one algorithm against another. In contrast 
to the exponential time it takes to use trial division to 
recognize primes, consider the problem of multiplying 
two numbers. The school method of multiplication is 
to take each digit of one number in turn and multiply 
it by the other number, forming a parallelogram array. 
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One then performs an addition to obtain the answer. 
If you now double the number of digits in each num- 
ber, then the parallelogram becomes twice as large in 
each dimension, so the running time grows by a factor 
of about 4. Multiplication of two numbers is an exam- 
ple of a “polynomial time” algorithm; its running time 
scales by a constant factor when the input length is 
doubled. 

One might then rephrase Gauss’s call to arms as fol- 
lows. Is there a polynomial time algorithm that distin- 
guishes prime numbers from composite numbers? Is 
there a polynomial time algorithm that can produce a 
nontrivial factor of a composite number? It might not 
be apparent at this point that these are two different 
questions, since trial division does both. We will see, 
though, that it is convenient to separate them, as did 
Gauss. 

Let us focus on recognizing primes. What we would 
like is a simply computed criterion that primes satisfy 
and composites do not, or vice versa. An old theorem 
of Wilson might just fit the biil. Note that 6! = 720, 
which is just one less than a multiple of 7. Wilson’s 
theorem asserts that if n is prime, then (n - 1)! = -1 
(mod n). (The meaning of this and similar statements 
is explained in modular arithmetic [III.60].) This can- 
not hold when n is composite, for if p is a prime 
factor of n and is smaller than n, then it is a fac- 
tor of (n- 1)!, so it cannot possibly be a factor of 
(n - 1)! + 1. Thus, we have an ironclad criterion for 
primality. However, the Wilson criterion does not meet 
the standard of being simply computed, since we know 
no especially rapid way of computing factorials mod- 
ulo another number. For example, Wilson predicts that 
268! = -1 (mod 269), as we have already seen that 
269 is prime. But if we did not know this already, how 
in the world could we quickly find the remainder when 
268! is divided by 269? We can work out the product 
268! one factor at a time, but this would take many 
more steps than trying divisors up to 17. It is hard to 
prove that something cannot be done, and in faet there 
is no theorem that says we cannot compute a\ mod b 
in polynomial time. We do know some ways of speed- 
ing up the computation over the totally naive method, 
but all methods known so far take exponential time. 
So, Wilson’s theorem initially seems promising, but in 
faet it is no help at all unless we can find a fast way to 
compute a\ mod b. 

How about fermat’s little theorem [III.60]? Note 
that 2 7 = 128, which is 2 more than a multiple of 7. Or 
take 3 5 = 243, which is 3 mod 5. Fermat’s little theorem 


tells us that if n is prime and a is any integer, then 
a n = a (mod n). If computing a large factorial modulo 
n is hard, perhaps it is also hard to compute a large 
power modulo n. 

It cannot hurt to try it out for some moderate exam- 
ple to see if any ideas pop up. Take a = 2 and n = 91, so 
that we are trying to compute 2 91 mod 91. A powerful 
idea in mathematics is that of reduction. Can we reduce 
this computational problem to a smaller one? Notice 
that if we had already computed 2 45 mod 91, obtaining 
a remainder ri, say, then 2 91 = 2 r 2 (mod 91). That is, 
it is just a short additional calculation to get to our goal, 
yet the power 45 is only half as big. How to continue 
is clear: we further reduce to the exponent 22, which is 
less than half of 45. If 2 22 mod 91 = r2, then 2 45 = 2 r 2 
(mod 91). And of course 2 22 is the square of 2 11 , and 
so on. It is not so hard to “automate” this procedure: 
the exponent sequence 

1, 2, 5, 11, 22, 45, 91 

can be read direetly from the binary (base 2) represen- 
tation of 91 as 1011011, since the above sequence in 
binary is 

1 , 10 , 101 , 1011 , 10110 , 101101 , 1011011 . 
These are the initial strings from the left of 1011011. 
And it is plain that the transition from one term to the 
next is either the double or the double plus 1. 

This procedure scales nicely. When the number of 
digits of n is doubled, so is the sequence of expo- 
nents, and the time it takes to get from one exponent 
to the next, being a modular multiplication, is multi- 
plied by 4. (As with naive multiplication, naive divide- 
with-remainder also takes four times as long when the 
size of the problem is doubled.) Thus, the overall time 
is multiplied by 8, yielding a polynomial time method. 
We call this the “powermod” algorithm. 

So, let us try to illustrate Fermat’s little theorem, 
taking a = 2 and n = 91. Our sequence of powers is 
2 1 s|, 2 2 = 4, 2 5 s 32, 2 t - , 'sa 46, 

2 22 s 23, 2 45 = 57, 2 91 = 37, 

where each congruence is modulo 91, and each term in 
the sequence is found by squaring the prior one mod 9 1 
or squaring and multiplying by 2 mod 91. 

Wait a second: does Fermat’s little theorem not say 
that we are supposed to get 2 for the final residue? Well, 
yes, but this is guaranteed only if n is prime. And as you 
have probably already noticed, 91 is composite. In faet, 
the computation proves this. 



TV. 3. Computational Number Theory 


37 


Quite remarkably, here is an example of a computa- 
tion that proves that n is composite, yet it does not 
reveal any nontrivial factorization! 

You are invited to try out the powermod algorithm 
as above, but to change the base of the power from 
2 to 3. The answer you should come to is that 3’’ 1 = 3 
(mod 91): that is, the congruence for Fermat’s little the- 
orem holds. Since you already know that 91 is compos- 
ite, I am sure you would not jump to the false conclu- 
sion that it is prime! So, as it stands, Fermat’s little the- 
orem can sometimes be used to recognize composites, 
but it cannot be used to recognize primes. 

There are two interesting further points to be made 
regarding Fermat’s little theorem. First, on the nega- 
tive side, there are some composites, such as n = 561, 
where the Fermat congruence holds for every integer a. 
These numbers n are called Carmichael numbers, and 
unfortunately (from the point of view of testing pri- 
mality) there are infinitely many of them, a result due 
to Alford, Granville, and me. But, on the positive side, 
if one were to choose randomly among all pairs a, n 
for which a n = a (mod n), with a <n and n bounded 
by a large number x, almost certainly (as x grows) you 
would choose a pair with n prime, a result of Erdos and 
myself. 

It is possible to combine Fermat’s little theorem with 
another elementary property of (odd) prime numbers. 
If n is an odd prime, there are exactly two solutions 
to the congruence se 2 # | (mod n), namely ±1. Actu- 
ally, some composites have this property as well, but 
composites divisible by two different odd primes do 

Now let us suppose that n is an odd number and that 
we wish to determine whether it is prime. Suppose that 
we pick some number a with 1 < a ^ n- 1 and discover 
that a n_1 s I (mod n). If we set x = a' n l)/2 , then 
x 2 = a n_1 = 1 (mod n); so, by the simple property 
of primes just mentioned, if n is prime, then x must 
be ±1. Therefore, if we calculate a (n ~' )l2 and discover 
that it is not congruent to ± 1 (mod n) , then n must be 
composite. 

Let us try this idea with a = 2, n = 561. We 
know already that 2 560 = 1 (mod 561), so what is 
2 280 mod 561? This too turns out to be 1, so we have 
not shown that 561 is composite. However, we can go 
further, since now we know that 2 140 is also a square 
root of 1 and computing this we find that 2 140 = 67 
(mod 561). So now we have found a square root of 1 
that is not ±1, which proves that 561 is composite. (Of 


course, for this particular number, it is obviously divis- 
ible by 3, so there was not really any mystery about 
whether it was prime or composite. But the method can 
be used in much less obvious cases.) In practice, there 
is no need to backtrack from a higher exponent to a 
smaller one. Indeed, in order to calculate 2 560 mod 561 
by the efficient method outlined earlier, one calculates 
the numbers 2 140 and 2 280 along the way, so that this 
generalization of the earlier test is both quicker and 
stronger. 

Here is the general principle that we have illustrated. 
Suppose that n is an odd prime and let a be an integer 
not divisible by n. Write n - 1 = 2 s t, where t is odd. 
Then 

either a l = 1 (modn) or a 2 ' 1 = -1 (modn) 
for some i = 0, 1, . . . , 5 — 1. Call this the strong Fer- 
mat congruence. The wonderful thing here is that, as 
proved independently by Monier and Rabbi, there is no 
analogue of a Carmichael number. They showed that if 
n is an odd composite, then the strong Fermat congru- 
ence fails for at least three quarters of the choices for 
a with l§it ^n-l. 

If you want only to be able to distinguish between 
primes and composites in practice, and you do not 
insist on proof, then you have read enough. Namely, 
given a large odd number n, choose twenty values of 
a at random from [1, n - 1], and begin trying to verify 
the strong Fermat congruence with these bases a. If it 
should ever fail, you may stop: the number n must be 
composite. And if the strong Fermat congruence holds, 
we might surmise that n is actually prime. Indeed, if 
n were composite, the Monier-Rabin theorem says that 
the chance that the strong Fermat congruence would 
hold for twenty random bases is at most 4 -20 , which 
is less than one chance in a trillion. Thus we have a 
remarkable probabilistic test for primality. If it tells us 
that n is composite, then we know for sure that n is 
composite; if it tells us that n is prime, then the chances 
that n is not prime are so small as to be more or less 
negligible. 

If three quarters of the numbers a in [l,n - 1] pro- 
vide the key to an easily checkable proof that the odd 
composite number n is indeed composite, surely it 
should not be so hard to find just one! How about 
checking small numbers a, in order, until one is found? 
Excellent, but when do we stop? Let us think about this 
for a moment. We have given up the power of random- 
ness and are forring ourselves to choose sequentially 
among small numbers for the trial bases a. Can we 
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argue heuristically that they continue to behave as if 
they were random choices? Well, there are some con- 
nections among them. For example, if taking a = 2 does 
not result in a proof that n is composite, then neither 
will taking any power of 2. It is theoretically possible for 
2 and 3 not to give proofs that n is composite but for 
6 to work just fine, but this turns out not to be very 
common. So let us amend the heuristic and assume 
that we have independence for prime values of a. Up 
to log n log log n there are about log n primes (via the 
prime number theorem [V.29] discussed later in this 
article); so, heuristically, the probability that n is com- 
posite, but that none of these primes help us to prove it, 
is about 4~ logn < n~ 4/3 . Since the infmite sum £ n~ 413 
converges, perhaps a stopping point of log n log log n 
is sufficient, at least for large n. 

Miller was able to prove the slightly weaker result 
that a stopping point of c(logn) 2 is adequate, but 
his proof assumes a generalization of the riemann 
hypothesis [V.29]. (We discuss the Riemann hypoth- 
esis below; the generalization that Miller assumes is 
beyond the scope of this article.) In further work, Bach 
was able to show that we may take c = 2 in this 
last result. Summarizing, if this generalized Riemann 
hypothesis holds, and if the strong Fermat congruence 
holds for every positive integer a ^ 2(logn) 2 , then n 
is prime. So, provided that a famous unproved hypoth- 
esis in another held of mathematics is correct, one can 
decide in polynomial time, via a deterministic algo- 
rithm, whether n is prime or composite. (It has been 
tempting to use this conditional test, for if it should 
ever lie to you and tell you that a particular compos- 
ite number is prime, then this failure — if you were able 
to detect it— would be a disproof of one of the most 
famous conjectures in mathematics. Perhaps this is not 
too disastrous a failure!) 

After Miller’s test in the 1970s, the question con- 
tinually challenging us was whether it is possible to 
test for primality in polynomial time without assuming 
unproved hypotheses. Recently, Agrawal et al. (2004) 
answered this question with a resounding yes. Their 
idea begins with a combination of the binomial theo- 
rem and Fermat’ s little theorem. Given an integer a, 
consider the polynomial (x + a) n and expand it in the 
usual way through the binomial theorem. Each interme- 
diate term between the leading x n and the trailing a n 
has the coefficient n!/(j!(n - j)\) for some j between 
1 and n - 1. If n is prime, then this coefficient, which 
is an integer, is divisible by n because n appears as 
a factor in the numerator that is not canceled by any 


factors in the denominator. That is, the coefficient is 
0 (mod n). For example, (x + i) 7 is equal to 

x 7 + 7x 6 + 21x 5 + 35x 4 + 35x 3 + 21x 2 + 7x + l, 
and we see each internal coefficient is a multiple of 7. 
Thus, we have (x + l)f%ix 7 + 1 (mod 7). (Two poly- 
nomials are congruent mod n if corresponding coeffi- 
cients are congruent mod n.) In general, if n is prime 
and a is any integer, then via this binomial-theorem 
idea and Fermat’s little theorem we have 

(x + a) n = x n + a n = x n + a (modn). 

It is an easy exercise to show that this congruence in the 
simple case a = 1 is actually equivalent to primality. 
But as with the Wilson criterion we know no way of 
quickly verifying that all these coefficients are indeed 
divisible by n. 

Flowever, one can do more with polynomials than 
raise them to powers. We can also divide one poly- 
nomial by another to find a quotient and a remain- 
der, just as we do with integers. It makes sense, for 
example, to say that gix) = hix) (mod fix)), mean- 
ing that gix) and hix) leave the same remainder 
when divided by fix). We will write gix) = hix) 
(mod n, f{x) ) if the remainders upon division by fix) 
are congruent mod n. As with the powermod algo- 
rithm for integer congruences, we can quickly compute 
gix) n (mod n,/(x)), provided the degree of fix) is 
not too big. This is exactly what Agrawal et al. propose. 
They have an auxiliary polynomial fix) of not-too-high 
degree such that, if 

(x + a) n =x n + a (modn,/(x)) 
for each a = 1,2 ,...,£, for a not-too-high bound B, then 
n must be in a set that contains the primes and certain 
composites that are easily recognized as composites. 
(Not all composites are hard to recognize as such, e.g., 
any number with a small prime factor is easy to rec- 
ognize.) These ideas put together form the primality 
test of Agrawal et al. To give the argument in full detail 
one has to specify the auxiliary polynomial .fix) that 
is used and what the bound B is, and one has to prove 
rigorously that it is exactly the primes which pass the 
test. 

Agrawal et al. (2004) show that the auxiliary poly- 
nomial f(x) can be taken to be the beautifully sim- 
ple x r - 1, with an elementary upper bound for r of 
about (logn) 5 . Doing this leads to a time bound of 
about (log n ) 10,5 for the algorithm. Using a numeri- 
cally ineffective tool, they bring the time bound down 
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to (log n) 7 5 . Recently, Lenstra and I presented a not-so- 
simple but numerically effective method of bringing the 
exponent on log n down to 6. We did this by expand- 
ing the set of polynomials used beyond those of the 
form x r - 1: in particular we used polynomials that are 
related to Gauss's famous algorithm for construction of 
certain regular n-gons with straightedge and compass 
(see algebraic numbers [IV. 1 §13]). It was indeed sat- 
isfying to us to bring in a famous tool of Gauss to say 
something about his problem of distinguishing prime 
numbers from composite numbers. 

Are the new polynomial-time primality tests good in 
practice? So far, the answer is no, the competition is 
just too tough. For example, using the arithmetic of 
elliptic curves [111.21] we can come up with bona fide 
proofs of primality for huge numbers. This algorithm 
is conjectured to run in polynomial time but we have 
not even proved that it always terminates. If, at the end 
of the day, or in this case the end of the run, we have a 
legitimate proof, then perhaps we can tolerate the situ- 
ation of not being sure that it would work out when we 
started! The method, pioneered by Atkin and Morain, 
has recently proved the primality of a number that has 
over 20 000 decimal digits, and is not of some special 
form such as 2” - 1 that makes testing for primality 
easier. The record for the new breed of polynomial-time 
tests is a measly 300 digits. 

For numbers of certain special forms there are much 
faster primality tests. Mersenne primes comprise the 
most famous of these forms; these are primes that are 
1 less than a power of 2. It is suspected that there are 
infinitely many examples, but we seem to be very far 
from a proof of this. Just forty-three Mersenne primes 
are known, the record example being 2 30402457 — 1, a 
prime with more than 9.15 million decimal digits. 

For much more on primality testing, and for ref- 
erences to various other sources, see Crandall and 
Pomerance (2005). 

3 Factoring Composite Numbers 

Compared with what we know about testing primality, 
our ability to factor large numbers is still in the dark 
ages. In faet this imbalance between the two problems 
forms the bulwark for the security of electronic com- 
merce on the Internet. (See mathematics and cryp- 
tography [VII. 7] for an account of why.) This is a very 
important application of mathematics, but also an odd 
one, and not something to brag about, since it depends 
on the inability of mathematicians to efficiently solve a 
basic problem! 


Nevertheless, we do have our tricks. Part of the land- 
scape is euclid’s algorithm [III.22] for computing the 
greatest cornmon divisor (GCD) of two numbers. One 
might naively think that, to find the GCD of two posi- 
tive integers m and n, one should find all of their divi- 
sors and pick the largest one common to the two. But 
Euclid’s algorithm is much more efficient: the number 
of arithmetic steps is bounded by the logarithm of the 
smaller number, so not only does it run in polynomial 
time, it is in faet quite speedy. 

So, if we can build up a special number m that may 
be likely to have a nontrivial factor in common with n, 
we can use Euclid’s algorithm to discover this factor. 
For example, Pollard and Strassen (independently) used 
this idea, together with fast subroutines for multipli- 
cation and polynomial evaluation, to enhance the trial 
division method discussed in the last section. Some- 
what miraculously, one can take the integers up to n 1/2 , 
break them into n 1/4 subintervals of length n 1 /4 , and 
for each subinterval calculate the GCD of n with the 
product of all the integers in the subinterval, spending 
only about n ] 14 elementary steps in total. If n is com- 
posite, then at least one GCD will be larger than 1, and 
then a search over the first such subinterval will locate 
a nontrivial factor of n. To date, this algorithm is the 
fastest rigorous and deterministic method of factoring 
that we know. 

Most practical factoring algorithms are based on 
unproved but reasonable-seeming hypotheses about 
the natural numbers. Although we may not know how 
to prove rigorously that these methods will always pro- 
duce a factorization, or do so quickly, in practice they 
do. This situation resembles the experimental Sciences, 
where hypotheses are tested against experiments. Our 
experience with certain factoring algorithms is now so 
overwhelming that a scientist might claim that a phys- 
ical law is involved. As mathematicians, we still search 
for proof, but fortunately the numbers we factor do not 
feel the need to wait for us. 

I often mention a contest problem from my high 
school years: factor 8051. The trick is to notice that 
8051 = 90 2 - 7 2 = (90 - 7) (90 + 7), from which 
the factorization 83 ■ 97 can be read off. In faet every 
odd composite can be factored as the difference of 
two squares, an idea that goes back to fermat [VI.12]. 
Indeed, if n has the nontrivial factorization ab, then let 
u = \(a + b) and v = j(a - b), so that n = u 2 - v 2 , 
and a = u + v,b = u- v. This method works very well 
if n has a divisor very close to n 1/2 , as n = 8051 does, 
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but in the worst case, the Fermat method is slower than 
trial division. 

My quadratic sieve method (which follows work of 
Kraitchik, Brillhart-Morrison, and Schroeppel) tries to 
efficiently extend Fermat’s idea to all odd composites. 
For example, take n = 1649. We start just above n 1/2 
with j = 41, and consider the numbers j 2 - 1649. As j 
runs, we will eventually hit a value where j 2 - 1649 is 
a square, and so be able to use Fermat’s method. Let’s 
try it: 

41 2 - 1649 = 32, 

42 2 - 1649 = 115, 

43 2 - 1649 = 200, 

Well, no squares yet, which is not surprising, since the 
Fermat method is often very poor. But wait, do the 
first and third lines not multiply together to give a 
square? Yes they do, 32 ■ 200 = 80 2 . So, multiplying the 
first and third lines, and treating them as congruences 
mod 1649, we have 

(41 ■ 43) 2 = 80 2 (mod 1649). 

That is, we have a pair u, v with u 2 = v 2 (mod 1649). 
This is not quite the same as having u 2 -v 2 = 1649, but 
we do have 1649a divisor of u 2 -v 2 = ( u-v)(u + v ). 
Now maybe 1649 divides one of these factors, but if 
it does not, then it is split between them, and so a 
computation of the GCD of u - v (or u + v) with 
1649 will reveal a proper factor. Now v = 80 and 
u = 41 ■ 43 k 1 14 (mod 1649), and so we see instantly 
that u f ±v (mod 1649), so we are in business. The 
GCD of 114 - 80 = 34 with 1649 is 17. Dividing, we see 
that 1649 = 17 ■ 97, and we are done. 

Can we generalize this? In trying to factor n = 1649 
we considered consecutive values of the quadratic poly- 
nomial f(j) = j 2 - n for j starting just above and 
viewed these as congruences j 2 = f( j) (mod n). Then 
we found a set M of numbers j with YljeMfU) equal 
to a square, say v 2 . We then let u = YljeM J, so that 
u 2 s=i i' 2 (mod n). Since u é ±v (mod n), we could 
split n via the GCD of u-v and n. 

There is another lesson that we can learn from our 
small example with n = 1649. We used 32 and 200 to 
form our square, but we ignored 1 1 5 . If we had thought 
about it, we might have no ticed from the start that 32 
and 200 were more likely to be useful than 115. The 
reason is that 32 and 200 are smooth numbers (mean- 
ing that they have only small prime factors), while 115 


is not smooth, having the relatively large prime factor 
23. Say you have k+ 1 positive integers that involve 
in their prime factorizations only the first k primes. 
It is an easy theorem that some nonempty subset of 
these numbers has product a square. The proof has 
us associate with each of these numbers, which can be 
written in the form Pi x p% 2 ■ ■ ■ p^ k , an exponent vee- 
for (a\ , a.2, . . . , (ifc). Since squares are detected by all 
even exponents, we really only care whether the expo- 
nents a; are odd or even. Thus, we think of these vec- 
tors as having coordinates 0 and 1, and when we add 
them (which corresponds to multiplying the underlying 
numbers), we do so mod 2. Since we have k + 1 vectors, 
each with only k coordinates, an easy matrix calculation 
leads quickly to a nonempty subset that adds up to the 
0-vector. The product of the corresponding integers is 
then a square. 

In our toy example with n = 1649, the first and 
third numbers, which are 32 = 2 5 3°5° and 200 = 
2 3 3°5 2 , have exponent vectors (5,0,0) and (3,0,2), 
which reduce to (1,0,0) and (1,0,0), so we see that 
the sum of them is (0,0,0), which indicates that we 
have a square. We were lucky that we could make do 
with just two vectors, instead of the four that the above 
argument shows would be sufficient. 

In general with the quadratic sieve, one finds smooth 
numbers in the sequence j 2 - n, forms the expo- 
nent vectors mod 2, and then uses a matrix to find a 
nonempty subset which adds up to the 0-vector, which 
then corresponds to a set M for which Yl jeM fU) is a 
square. 

In addition, the “sieve” in the quadratic sieve comes 
in with the search for smooth values of f(j) = j 2 - n. 
These numbers are the consecutive values of a (quad- 
ratic) polynomial, so those divisible by a given prime 
can be found in regular places in the sequence. For 
example, in our illustration, j 2 - 1649 is divisible by 5 
precisely when j = 2 or 3 (mod 5). A sieve very mueh 
like the sieve of Eratosthenes can then be used to effi- 
ciently find the special numbers j where j 2 - n is 
smooth. A key issue, though, is how smooth a value 
f(j) has to be for us to decide to accept it. If we choose 
a smaller bound for the primes involved, we do not 
have to find all that many of them to use the matrix 
method. But such very smooth values might be very 
rare. If we use a larger bound for the primes involved, 
then smooth values of f(j) may be more common, 
but we will need many of them. Somewhere between 
smaller and larger is just right! In order to make the 
choice, it would help to know how frequently values of 
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an irreducible quadratic polynomial are smooth. Unfor- 
tunately, we do not have a theorem that tells us, but we 
can still make a good choice by assuming that this fre- 
quency is about that for a random number of the same 
size, an assumption that is probably correct even if it 
is hard to prove. 

Finally, note that if the final GCD yields only a trivial 
factor with n, one can continue just a bit longer and find 
more linear dependencies, each with a fresh chance at 
splitting n. 

These thoughts lead us to a time bound of about 
exp(-^lognloglogn) 

for the quadratic sieve to factor n. Instead of being 
exponential in the number of digits of n, as with trial 
division, this is exponential in about the square root 
of the number of digits of n. This is certainly a huge 
improvement, but it is still a far cry from polynomial 
time. 

Lenstra and I actually have a rigorous random fac- 
toring method with the same time complexity as that 
above for the quadratic sieve. (It is random in the sense 
that a coin is flipped at various junctures, and decisions 
on what to do next depend on the outcomes of these 
flips. Through this process, we expect to get a bona fide 
factorization within the advertised time bound.) How- 
ever, the method is not so computer practical, and if 
you had to choose in practice between the two, then 
you should go with the nonrigorous quadratic sieve. A 
triumph for the quadratic sieve was the 1994 factor- 
ization of the 129-digit RSA cryptographic challenge 
first published in Martin Gardner’s column in Scientific 
American in 1977. 

The number field sieve, which is another sieve-based 
factoring algorithm, was discovered in the late 1980s 
by Pollard for integers close to powers, and later devel- 
oped by Buhier, Lenstra, and me for general integers. 
The method is similar in spirit to the quadratic sieve, 
but assembles its squares from the product of certain 
sets of algebraic integers. The number held sieve has a 
conjectured time complexity of the type 
exp(c(logn) 1/3 (log logn) 2/3 ), 
for a value of c slightlybelow 2. For composite numbers 
beyond 100 digits or so that have no small prime factor, 
it is the method of choice, with the current record being 
200 decimal digits. 

The sieve-based factorization methods share the 
property that if you use them, then all composite num- 
bers of about the same size are equally hard to fac- 
tor. For instance, factoring n will be about as difficult 


if n is a product of five primes each roughly near the 
fifth root of n as it will be if n is a product of two 
primes roughly near the square root of n. This is quite 
unlike trial division, which is happiest when there is 
a small prime factor. We will now describe a famous 
factorization method due to Lenstra that detects small 
prime factors before large ones, and beyond baby cases 
is much superior to trial dividing. This is his elliptic 
curve method. 

Just as the quadratic sieve searches for a number 
m with a nontrivial GCD with n, so does the elliptic 
curve method. But where the quadratic sieve painstak- 
ingly builds up to a successful m from many small suc- 
cesses, the elliptic curve method hopes to hit upon m 
with essentially one lucky choice. 

Choosing random numbers m and testing their GCD 
with n can also have instant success, but you can well 
imagine that if n has no small prime factors, then the 
expected time for success would be enormous. Instead, 
the elliptic curve method involves considerably more 
cleverness. 

Consider first the “p - 1 method” of Pollard. Suppose 
you have a number n you wish to factor and a certain 
large number k. Unbeknownst to you, n has a prime 
factor p with p - la divisor of k, and another prime 
factor q with q - 1 not a divisor of k. You can use this 
imbalance to split n. First of all, by Fermat’s little the- 
orem there are many numbers u with u k 3 1 (mod p) 
and u k é 1 (mod q). Say you have one of these, and let 
m be u k - 1 reduced mod n. Then the GCD of m and n 
is a nontrivial factor of n; it is divisible by p but not by 
q. Pollard suggests taking k as the least common mul- 
tiple of the integers to some moderate bound so that it 
has many divisors and perhaps a decent chance that it 
is divisible by p - 1. The best case of Pollard’s method 
is when n has a prime factor p with p - I smooth (has 
all small prime factors— see the quadratic sieve discus- 
sion above). But if n has no prime factors p with p - 1 
smooth, Pollard’s method fares poorly. 

What is going on here is that corresponding to the 
prime p we have the multiplicative group [1.3 §2.1] of 
the p — 1 nonzero residues mod p. Furthermore, when 
doing arithmetic mod n with numbers relatively prime 
to n, we are, whether we realize it or not, doing arith- 
metic in this group. We are exploiting the faet that u k 
is the group identity mod p, but not mod q. 

Lenstra had the brilliant idea of using the Pollard 
method in the context of elliptic curve groups. There 
are many elliptic curve groups associated with the 
prime p, and therefore many chances to hit upon one 
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where the number of elements is smooth. Of great 
importance here are theorems of Hasse and Deuring. 
An elliptic curve [III.21] mod p (for p > 3) can be 
taken as the set of solutions to the congruence y 2 = 
x 3 + ax + b (mod p), for given integers a, b with 
the property that x 3 + ax + b does not have repeated 
roots mod p. There is one additional “point at infin- 
ity” thrown in (see below). A fairly simple addition law 
(but not as simple as adding coordinatewise!) makes 
the elliptic curve into a group, with the point at infmity 
as the identity (see rational points on curves and 
the mordell conjecture [V.32]). Hasse, in a result 
later generalized by weil [VI.93] with his famous proof 
of the “Riemarm hypothesis for curves,” showed us 
that the number of elements in the elliptic curve group 
always Ues between p + 1 - 2,jp and p + 1 + 2 yp (see 
THE WEIL conjectures [V.38])). And Deuring proved 
that every number in this range is indeed associated 
with some elliptic curve mod p. 

Say we randomly choose integers x\, yi, a, and then 
choose b so that y\ is congruent to x\ + ax i + b 
(mod n). This gives us the curve with coefficients a, b 
and a point P = (x\,y\) on the curve. One can then 
mimic the Pollard strategy, with a number k as before 
with many divisors, and with the point P playing the 
role of u. Let kP denote the k-fold sum of P added to 
itself using eUiptic curve addition. If kP is the point at 
infmity on the curve considered mod p (which it will be 
if the number of points on the curve is a divisor of k), 
but not on the curve considered mod q, then this gives 
us a number m whose GCD with n is divisible by p and 
not by q. We wiU have factored n. 

To see where m comes from it is convenient to con- 
sider the curve projectively: we take solutions (x, y, z) 
of the congruence y 2 z = x 3 + axz 2 + bz 3 (mod p). 
The triple (cx,cy,cz) when c f 0 is considered to be 
the same as (x,y, z). The mysterious point at infmity 
is now demystified; it is just (0,1,0). And our point P is 
(xi,yi, 1). (This is the mod p version of classical pro- 
jective geometry [1.3 §6.7].) Say we work mod n and 
compute the point kP = (x^y^Zk). Then the candi- 
date for the number m is just z^. Indeed, if kP is the 
point at infmity mod p, then Zk = 0 (mod p), and if it 
is not the point at infmity mod q, then Zké 0 (mod q). 

When Pollard’s p - 1 method fails, our only recourse 
is to raise k or give up. With the eUiptic curve method, if 
things do not work for our randomly chosen curve, we 
can pick another. Corresponding to the hidden prime p 
in n, we are actually picking new elliptic curve groups 
mod p, and so gaining a fresh chance for the number of 


elements in the group to be smooth. The elliptic curve 
method has been quite successful in factoring numbers 
which have a prime factor up to about fifty decimal dig- 
its, and occasionally even somewhat larger primes have 
been discovered. 

We conjecture that the expected time for the elliptic 
curve method to find the least prime factor p of n is 
about 

exp (-^2 log p log log p ) 

arithmetic operations mod n. What is holding us back 
from proving this conjecture is not lack of knowledge 
about elliptic curves, but rather lack of knowledge of 
the distribution of smooth numbers. 

For more on these and other factorization meth- 
ods, the reader is referred to Crandall and Pomerance 
(2005). 

4 The Riemann Hypothesis and 
the Distribution of the Primes 

As a teenager looking at a modest table of primes, 
Gauss conjectured that their frequency decays loga- 
rithmically and that li(x) = J* (1/ log f) dt should be a 
good approximation for tt(x), the number of primes 
between 1 and x. Sixty years later, riemann [VI.49] 
showed how Gauss’s conjecture can be proved if one 
assumes that the Riemann zetafunction £(s) = X n n ~ s 
has no zeros in the complex half -plane where the real 
part of s is greater than | . The series for £ (5) converges 
only f or Re 5 > 1 , but it may be analytically continued to 
Re s > 0, with a simple pole at 5 = 1 . (For a brief descrip- 
tion of the process of analytic continuation, see some 
FUNDAMENTAL MATHEMATICAL DEFINITIONS [1.3 §5.6].) 
This continuation may be seen quite concretely via the 
identity £(s) = s/(s - 1) - s J“ {x}x~ 5-1 dx, with {x} 
the fractional part of x (so that {x} = x - [x]): note 
that this integral converges quite nicely in the half- 
plane Res > 0. In faet, via Riemann’ s funetional equa- 
tion mentioned below, £ (s ) can be continued to a mero- 
morphic funetion in the whole complex plane, with the 
single pole at s = 1. 

The assertion that £(s) # 0 for Res > \ is known 
as the riemann hypothesis [IV.2 §3]; arguably it is 
the most famous unsolved problem in mathematics. 
Though hadamard [VI.65] and de la vallée poussin 
[VI.67] were able in 1896 to prove (independently) a 
weak form of Gauss's conjecture known as the prime 
number theorem [V.29], the apparent breathtaking 
strength of the approximation U(x) to tt (x) is uncanny. 
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For example, take x = 10 22 . We have 

tt(10 22 ) = 201467 286 689 315 906 290 
exactly, and, to the nearest integer, we have 

li(10 22 ) » 201467 286 691248 261497. 

As you can plainly see, Gauss’s guess is right on the 
money! 

The numerical computation of li(x) is simple via 
numerical methods for integration, and it is directly 
obtainable in various mathematics computing pack- 
ages. However, the computation of tt(10 22 ) (due to 
Gourdon) is far from trivial. It would be far too labo- 
rious to count these approximately 2 x 10 20 primes 
one by one, so how are they counted? In faet, we have 
various combinatorial tricks to count without Usting 
everything. For example, one does not need to count 
one by one to see that there are exactly 2 [ 10 22 /6] + 1 
integers in the interval from 1 to 10 22 that are rela- 
tively prime to 6. Rather, one thinks of these numbers 
grouped in blocks of six, with two in each block coprime 
to 6. (The “+ 1 ” comes from the partial block at the end.) 
Buil ding on early ideas of Meissel and Lehmer, Lagarias, 
MiUer, and Odlyzko presented an elegant combinato- 
rial method for computing tt(x) that takes about x 2/3 
elementary steps. The method was refined by Deléglise 
and Rivat, and then Gourdon found a way to distribute 
the computation to many computers. 

From work of von Koch, and later Schoenfeld, we 
know that the Riemann hypothesis is equivalent to the 
assertion that 

|tt(x) - li(x)| < Vxlogx (1) 

for all x ^ 3 (see Crandall and Pomerance 2005, exer- 
cise 1.37). Thus, the mammoth calculation of tt(10 22 ) 
might be viewed as computational evidence for the Rie- 
mann hypothesis — in faet, if the count had turned out 
to violate (1), we would have had a disproof. 

It may not be obvious what (1) has to do with the loca- 
tion of the zeros of £(s). To understand the connection, 
let us first dismiss the so-called “trivial” zeros, which 
occur at each negative even integer. The nontrivial 
zeros p are known to be infinite in number, and, as men- 
tioned above, are conjectured to satisfy Re p ^ There 
are certain symmetries among these zeros: indeed, if p 
is a zero, then so are p, 1 - p, and 1 - p. Therefore, the 
Riemann hypothesis is the assertion that every nontriv- 
ial zero has real part equal to \. (The symmetry with 
p and 1 - p, which fohows from Riemann’ s funetional 


equation £(1 - s) = 2(2tt)~ s cos( j7Ts)r(s)£(s), per- 
haps provides some heuristic support for the Riemann 
hypothesis.) 

The connection to prime numbers begins with the 
FUNDAMENTAL THEOREM OF ARITHMETIC [V.16], which 
yields the identity 

w) =in s = n i p~ js 

n=l p prime j=0 

= n a -p- s r\ 

P Prime 

a product that converges when Res > 1. Thus, taking 
the logarithmic derivative (that is, taking the logarithm 
of both sides and then differentiating), we have 
C' (S) _ y logp y y logp 

p prrnieP 5 ” 1 pprune^l^' 

That is, if we define A(n) to be logp if n = p> for a 
prime p and an integer j ^ 1, and A(n) = 0 if n is not 
of this form, then we have the identity 
y A(n) = g' (Si 
^ n s £(s) ' 

Through various relatively routine calculations, one can 
then relate the funetion 

<p(x) = X A («> 

to the residues at the poles of £'/£, which correspond 
to the zeros (and single pole) of X- hi faet, as Riemann 
showed, we have the following beautiful formula: 

cp(x) - x log(2n-) - | log ( 1 - x“ 2 ) 

if x itself is not a prime or prime power, and where the 
sum over the nontrivial zeros p of X is to be understood 
in the symmetric sense where we sum over those p with 
| Imp| < T and let T — oo. Through elementary manip- 
ulations, an understanding of the funetion <p(x) read- 
ily gives an equivalent understanding of tt(x), and it 
should be clear now that <p(x) is intimately connected 
to the nontrivial zeros p of X- 
The funetion ip(x) defined above has a simple inter- 
pretation. It is the logarithm of the least common mul- 
tiple of the integers in the interval [l,x]. As with 
(1) we have an elementary translation of the Riemann 
hypothesis: it is equivalent to the assertion that 
</'(x) x .< v xlog 2 x 

for all x Ss 3. This inequality involves only the ele- 
mentary concepts of least common multiple, natural 
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logarithm, absolute value, and square root, yet it is 
equivalent to the Riemann hypothesis. 

A number of nontrivial zeros p of £(s) have actually 
been calculated and it has been verified that they lie on 
the line Re 5 = | . One might wonder how someone can 
computationally verify that a complex number p has 
Rep = \ . For example, suppose that we are carrying 
calculations to (an unrealistically large) 10 10 significant 
digits, and suppose we come across a zero with real 
part \ + 10 _1 ° 100 . It would be far beyond the precision 
of the calculation to be able to distinguish this num- 
ber from \ itself. Nevertheless, we do have a method 
for seeing if particular zeros p satisfy Re p = l ■ There 
are two ideas involved, one of which comes from ele- 
mentary calculus. If we have a continuous real-valued 
function fix) defined on the real numbers, we can 
sometimes use the intermediate value theorem to count 
zeros. For example, say /(l) > 0, / ( 1 . 7 ) < 0, /( 2.3) > 
0. Then we know for sure that / has at least one zero 
between 1 and 1.7, and at least one zero between 1.7 
and 2.3. If we know for other reasons that / has exactly 
two zeros, then we have accounted for both of them. 
To locate zeros of the complex function £(s), a real- 
valued function g(t) is constructed with the property 
that £( J 4fit) = 0 if and only if g(t) = 0. By looking at 
sign changes for g(t) for 0 < t < T, we can get a lower 
bound for the number of zeros p of t, with Rep = | 
and 0 < Im p < T. In addition, we can use the so-called 
argument principle from complex analysis to count the 
exact number of zeros with 0 < Imp < T. If we are 
lucky and this exact count is equal to our lower bound, 
then we have accounted for all of £’s zeros here, show- 
ing that they all have real part \ (and, in addition, that 
they are all simple zeros). If the counts did not match, 
it would not be a disproof of the Riemann hypothesis, 
but certainly it would indicate a region where we should 
be checking the data more closely. So far, whenever 
we have tried this approach, the counts have matched, 
though sometimes we have been forced to evaluate git) 
at very closely spaced points. 

The first few nontrivial zeros were computed by Rie- 
mann himself. The famous cryptographer and early 
computer scientist alan turing [VI.94] also computed 
some zeta zeros. The current record for this kind of 
calculation is held by Gourdon, who has shown that 
the first 10 13 zeta zeros with positive imaginary part 
all have real part equal to |, as predicted by Riemann. 
Gourdon’s method is a modification of that pioneered 
by Odlyzko and Schonhage (1988), who ushered in the 
modern age of zeta-zero calculations. 


Explicit zeta-function calculations can lead to highly 
useful explicit prime number estimates. If p n is the 
nth prime, then the prime number theorem implies 
that p n ~ nlogn as n — ■ oo. Actually, there is a sec- 
ondary term of order n log log n, and so for all suffi- 
ciently large n, we have p n > nlogn. By using explicit 
zeta estimates, Rosser was able to put a numerical 
bound on the “sufficiently large” in this statement, and 
then, by checking small cases, was able to prove that 
in faet p n > n log n for every n. The paper of Rosser 
and Schoenfeld (1962) is filled with highly useful and 
numerically explicit inequalities of this kind. 

Let us imagine for a moment that the Riemann 
hypothesis had been proved. Mathematics is never 
“used up,” as there is always that next problem around 
the bend. Even if we know that all of zeta's nontriv- 
ial zeros lie on the line Im5 = \, we can still ask how 
they are distributed on this line. We have a fairly con- 
cise understanding of how many zeros there should 
be up to a given height T. In faet, as already found by 
Riemann, this count is about (l/27T)TlogT. Thus, on 
average, the zeros would tend to get doser and doser 
with about (1/2 tt) log T of them in a unit interval near 
height T. 

This tells us the average distance, or spacing, 
between one zeta zero and the next, but there is mueh 
more that one can ask about how these spacings are 
distributed. In order to discuss this question, it is very 
convenient to “normalize” the spacings, so that the 
average (normalized) gap between consecutive zeros 
is 1. By Riemann’s result, together with our assump- 
tion of the Riemann hypothesis, this can be done if 
we mul tiply a gap near T by (l/2Tr)logT, or, equiv- 
alently, if for each zero p we replace its imaginary 
part t = Imp by (l/2Tr)tlogt. In this way we arrive 
at a sequence 61,62,- ■■ of normalized gaps between 
consecutive zeros, which on average are about 1. 

Checking numerically, we see that some 6 n are large, 
with others close to 0; it is just the average that is 1. 
Mathematics is well equipped to study random phe- 
nomena, and we have names for various probability 
distributions [III.73], such as Poisson, Gaussian, etc. 
Is this what is happening here? These zeta zeros are 
not random at all, but perhaps thinking in terms of 
randomness has promise. 

In the early twentieth century, hilbert [VI.63] and 
Polya suggested that the zeros of the zeta function 
might correspond to the eigenvalues [1.3 §4.3] of some 
operator [III.52]. Now this is provocative! But what 
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Figure 1 Nearest-neighbor spacing 
and the Gaudin distribution. 

operator? Some fifty years later in a now famous con- 
versation between Dyson and Montgomery at the Insti- 
tute for Advanced Study, it was conjectured that the 
nontrivial zeros behave like the eigenvalues of a ran- 
dom matrix from the so-called Gaussian unitary ensem- 
ble. This conjecture, now known as the GUE conjecture, 
canhe numerically tested in various ways. Odlyzko has 
done this, and found persuasive evidence for the con- 
jecture: the higher the batches of zeros one looks at, the 
more closely their distribution corresponds to what the 
GUE conjecture predicts. 

For example, take the 1 041 417089 numbers 6 n with 
n starting at 10 23 + 17 368 588 794. (The imaginary 
parts of these zeros are around 1.3 x 10 22 .) For each 
interval 07 100, (j + 1)/100] we can compute the pro- 
portion of these normalized gaps that lie in this inter- 
val, and plot it. If we were dealing with eigenvalues 
from a random matrix from the GUE, we would expect 
these statistics to converge to a certain distribution 
known as the Gaudin distribution (for which there is 
no closed formula, but which is easily computable). 
Odlyzko has kindly supplied me with the graph in fig- 
ure 1, which plots the Gaudin distribution against the 
data just described (but leaves out every second data 
point to avoid dutter). Like pearls on a necklace! The 
fit is absolutely remarkable. 

The vital interplay of thought experiments and 
numerical computation has taken us to what we feel is 
a deeper understanding of the zeta function. But where 


do we go next? The GUE conjecture suggests a con- 
nection to random matrix theory, and pursuing further 
connections seems promising to many. It may be that 
random matrix theory will allow us only to formulate 
great conjectures about the zeta function, and will not 
lead to great theorems. But then again, who can deny 
the power of a glimpse at the truth? We await the next 
chapter in this development. 

5 Diophantine Equations 
and the ABC Conjecture 

Let us move now from the Riemann hypothesis to fer- 
mat’s last theorem [V. 12]. Until the last decade it 
too was one of the most famous unsolved problems 
in mathematics, once even having a mention on an 
episode of Star Trek. The assertion is that the equa- 
tion x n + y n = z n has no solutions in positive integers 
x, y, z, n, where n ^ 3. This conjecture had remained 
unproved for three and a half centuries until Andrew 
Wiles published a proof in 1995. In addition, perhaps 
more important than the solution of this particular 
Diophantine equation (that is, an equation where the 
unknowns are restricted to the integers), the centuries- 
long quest for a proof helped establish the held of 
ALGEBRAic number theory [IV. 1]. And the proof itself 
established a long-sought and wonderful connection 
between modular forms [III.61] and elliptic curves. 

But do you know why Fermat’s last theorem is true? 
That is, just in case you are not an expert on all of the 
intricacies of the proof, are you surprised that there 
are in faet no solutions? In faet, there is a fairly simple 
heuristic argument that supports the assertion. First 
note that the case n = 3, namely x 3 + y 3 = z 3 , can be 
håndled by elementary methods, and this in faet had 
already been done by euler [VI.19]. So, let us focus on 
the cases when n > 4. 1 Let S n be the set of positive nth 
powers of integers. How likely is it that the sum of two 
members of S n is itself a member of S n ? Well, not at 
all likely, since Wiles has proved that this never occurs! 
But recall that we are trying to think naively. 

Let us try to mimic our situation by replacing the set 
S n with a random set. In faet, we will throw all of the 
powers together into one set. Following an idea of Erdos 
and Ulam (1971) we create a set % by a random process: 
each integer m is considered independently, and the 
chance it gets thrown into % is proportional to m~ 3/4 . 


1. Actually, Fermat himself had a simple proof in the case n = 4, 
but we ignore this. 
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This process would typically give us about x 1/4 num- 
bers in R in the interval [ 1 , x ] , or at least this would be 
the order of magnitude. Now the total number of fourth 
and higher powers between 1 and x is also about x 1/4 , 
so we can take our random set 31 as modeling the sit- 
uation for these powers, namely the union of all sets 
S n for n ^ 4. We ask how Ukely it is to have a + b = c 
where a, b, and c all come from Ti. 

The probability that a number m may be represented 
as a + b with 0 < a < b < m and a, b e 31 is propor- 
tional to Zo<«<m/2 a~ 3l4 (m - a)~ 3/4 , since for each a 
less than m the probability that a and m-a both lie in 
R is a~ 3/4 ( m - a)~ 3/4 . Actually, there is a minor caveat 
when m is even, since then a = m- a when a = \m: 
to cover this, we add the single term ( \m)~ 3/4 to the 
above sum. Replacing each m-a in the sum with \m, 
we get a larger sum that is easy to estimate and turns 
out to be proportional to m~ 1/2 . That is, the chance 
that a random number m is a sum of two members of 
31 is at most a certain quantity that is proportional to 
Now the events that would have to occur for 
m to be given as such a sum involve numbers smaller 
than m, so the event that m itself is in 31 is indepen- 
dent of these. Therefore, the probability that m is not 
only the sum of two members of Ti, but also itself a 
member of R, is at most a quantity proportional to 
3,4 = m~ 514 . So now we can count how many 
times we should expect a sum of two members of R to 
itself be a member of R . This is at most a constant times 
Z m m' 3/4 . But this sum is convergent, so we expect 
only finitely many examples. Further, since the tail of 
a convergent series is tiny, we do not expect any large 
examples. 

Thus, this argument suggests that there are at most 
finitely many positive integer solutions to 

x u +y v =z w , (2) 

where the exponents u, v, w are at least 4. Since Fer- 
mat’ s last theorem is the special case when u = v = te, 
we would have at most finitely many counterexamples 
to that as well. 

This seems tidy enough, but now we get a surprise! 
There are actually infinitely many solutions to (2) in pos- 
itive integers with u,v,w all at least 4. For example, 
note that 17 4 + 34 4 = 17 5 . This is the case a = 1, b = 2, 
u = 4 of a more general identity: if a, b are positive inte- 
gers, and c = a u + b u , we have (ac) u + ( bc) u = c u+1 . 
Another way to get infinitely many examples is to build 
on the possible existence of just one example. If x, y, 
z,u,v,w are positive integers satisfying (2), then with 


the same exponents, we may replace x, y , z with a vw x, 
a uw y, a uv z for any integer a, and so get infinitely 
many solutions. 

The point is that events of the kind that we are con- 
sidering— that a given integer is a power — are not quite 
independent. For instance, if A and B are both uth pow- 
ers, then so is AB, and this idea is exploited in the 
infinite families just mentioned. 

So how do we neatly bar these trivialities and come to 
the rescue of our heuristic argument? One simple way 
to do this is to insist that the numbers x, y, z in (2) be 
relatively prime. This gives no restriction whatsoever in 
the Fermat case of equal exponents, since a solution to 
x n +y n = z n with d the greatest common divisor of x, 
y, z leads to the coprime solution ( x/d) n + (y/d) n = 
( z/d) n . 

Concerning Fermat’s last theorem, one might ask 
how far it had actually been verified before the final 
proof by Wiles. The paper by Buhier et al. (1993) reports 
a verification for all exponents n up to 4 000 000. This 
type of calculation, which is far from trivial, has its 
roots in nineteenth-century work of kummer [VI.40] 
and early-twentieth-century work of Vandiver. In faet, 
Buhier et al. (1993) also verify in the same range 
a related conjecture of Vandiver dealing with cyclo- 
tomic helds, but this conjecture may in faet be false 
in general. 

The probabilistic thinking above, combined with 
computation of small cases, can carry us deeply into 
some very provocative conjectures. The above prob- 
abilistic argument can easily be extended to suggest 
that (2) has at most finitely many relatively prime solu- 
tions x, y, z over all possible exponent triples u, v, 
te with l/u + 1/v + l/te < 1. This conjecture has 
come to be known as the Fermat-Catalan conjecture, 
since it contains within it essentially Fermat’s last the- 
orem and also the Catalan conjecture (recently proved 
by Mihåilescu) that 8 and 9 are the only consecutive 
powers. 

It is good that we do allow for the possibility that 
there are some solutions, and this is where our main 
topic of computing comes in. For example, since 1 + 8 = 
9, we have a solution to x 7 + y 3 = z 2 , where x = 1, 
y = 2, and z = 3. (The exponent 7 is chosen to insure 
that the reciprocal sum of the exponents is less than 1. 
Of course, we could replace 7 by any larger integer, but 
since in each case the power involved is the number 
1, they should all toge ther be considered as just one 
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example.) Here are the known solutions to (2): 

1™ + 2 3 = 3 2 , 

2 5 + 7 2 = 3 4 , 

13 2 + 7 3 = 2 9 , 

2 7 + 17 3 = 71 2 , 

3 5 + li 4 = 122 2 , 

33 8 + 1 549 034 2 = 15613 3 , 

1414 3 + 2213459 2 = 65 7 , 

9262 3 + 15 312 283 2 = 113 7 , 

17 ? + 76 271 3 = 21 063 928 2 , 

43 8 + 96 222 3 = 30042 907 2 . 

The larger members were found in an exhaustive com- 
puter search by Beukers and Zagier. Perhaps this is the 
complete list of all solutions, or perhaps not— we have 
no proof. 

However, for particular choices of u,v,w, more can 
be said. Using results from a famous paper of Faltings, 
Darmon and Granville (1995) have shown that for any 
fixed choice of u, v,w with reciprocal sum at most 1, 
there are at most finitely many coprime triples x, y, 
z solving (2). For a particular choice of exponents, one 
might try to actually find all of the solutions. If it can 
be håndled at all, this task can involve a delicate inter- 
play between arithmetic geometry [IV. 5], effective 
methods in transcendental number theory, and good 
hard computing. In particular, the exponent triple sets 
{2,3,7}, {2,3,8}, {2,3,9}, and {2,4,5} are known to 
have all their solutions in the above table. See Poonen 
et al. (2007) for the treatment of the case {2, 3, 7} and 
links to other work. 

the abc conjecture [V.l] of Oesterlé and Masser 
is deceptively simple. It involves positive integer solu- 
tions to the equation a + b = c, hence the name. To 
put some meaning into a + b = c, we define the radi- 
cal of a nonzero integer n as the product of the primes 
that divide n, denoting this as rad(n). So, for exam- 
ple, rad(10) = 10, rad(72) = 6, and rad(65 536) = 2. 
In particular, high powers have small radicals in com- 
parison to the number itself, and so do many other 
numbers. Basically, the ABC conjecture asserts that if 
a + b = c, then the radical of abc cannot be too small. 
More specifically we have the following. 

The ABC conjecture. For each s > 0 there are at most 
finitely many relatively prime positive integer triples a, 
b, c with a + b = c and rad (abc) < c 1- C 


Note that the ABC conjecture immediately solves the 
Fermat-Catalan problem. Indeed, if u, v, w are positive 
integers with l/u + l/u + 1 /u> < 1, then it is easily 
found that we must have l/u + 1/v + l/w V. 41/42. 
Suppose we have a coprime solution to (2). Then x ^ 
z w,u and y ^z w/v , so that 

rad (x u y v z w ) < x;yz ^ (z w ) 41/42 . 

Thus, the ABC conjecture with s = 1/42 implies that 
there are at most finitely many solutions. 

The ABC conjecture has many other marvelous con- 
sequences; for a delightful survey, see Granville and 
Tucker (2002). In faet, the ABC conjecture and its gen- 
eralizations can be used to prove so many things that I 
have joked that it is beginning to resemble a false state- 
ment, since a false statement implies everything. But 
probably the ABC conjecture is true. Indeed, though a 
bit harder to see, the Erdds-Ulam probabilistic argu- 
ment can be modified to provide heuristic evidence for 
it too. 

Basic to this argument is a perfeetly rigorous result 
on the distribution of integers n for which rad(n) 
is below some bound. These ideas, which lead to a 
more explicit version of the ABC conjecture, are worked 
through in the thesis of van Frankenhuijsen and by 
Stewart and Tenenbaum. Here is a slightly weaker state- 
ment: if a + b = c are relatively prime positive integers 
and c is sufficiently large, then we have 

rad (abc) > c 1_1/ y^| (3) 

One might wonder how the numerical evidence 
stacks up against (3). This inequality asserts that if 
rad {abc) = r, then \og(c / r) / Jiogc < 1. So, let 
T(a,b,c) denote the test statistic log(c/r)/Vlogc. A 
Web site maintained by Nitaj (www.math.unicaen.fr/ 
-nitaj/abc.html) contains a wealth of information 
about the ABC conjecture. Checking the data, there are 
quite a few examples with T ( a , b, c) > 1 , the champion 
so far being 

a= 7 2 ■ 41 2 ■ 3 1 1 3 = 2477678 547239 
b = li 16 ■ 13 2 ■ 79 = 613474843408551921 511 
c = 2 ■ 3 3 ■ 5 23 ■ 953 = 613474845 886230468 750 
r = 2 ■ 3 ■ 5 ■ 7 ■ 11 ■ 13 ■ 41 ■ 79 ■ 311 ■ 953 
= 28828335646110, 
so that 

T(a, b, c) = 1 °g (c / r) = 2.43886.... 

Vlogc 

Is it always true that T(a, b, c) < 2.5? 
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One can get carried away with heuristics, forgetting 
that one is not actually proving a theorem, but mak- 
ing a guess. Heuristics are often based on the idea of 
randomness, and all bets are off if there is some under- 
lying structure. But how do we know that there is no 
underlying structure? Consider the case of an “abcd 
conjecture.” Here we consider integers a, b, c, and d 
with a+b + c + d = 0. The condition that the terms be 
relatively prime now takes on two possible meanings: 
pairwise relatively prime or no nontrivial common divi- 
sor of all four numbers. The first condition seems more 
in the spirit of the three-term conjecture, but may be a 
tad too strong in that it disallows using any even num- 
bers. So say we take the four terms with no pair having 
a common factor greater than 2. Under this condition, 
our heuristics seem to suggest that for each e > 0, we 
have 

rad (abcd) 1+£ < max{|a|, \b\, \c\, \d\} (4) 

for at most finitely many cases. But consider the poly- 
nomial identity 

(x + l) 5 = (x - l) 5 + 10(x 2 + l) 2 - 8 
(suggested to me by Granville). If we take x as a mul- 
tiple of 10, the four terms involved in the identity are 
pairwise relatively prime except for the last two, which 
have a common factor of 2. Let x = ll k - 1, which is 
a multiple of 10. The largest of the four terms is 1 l 5fc , 
and the radical of the product of the four terms is at 
most 

110(ll fe - 2)((ll k - l) 2 + 1) < 110 ■ ll 3k . 

The heuristics are saying that this cannot be, yet here 
it is right before our eyes! 

What is happening is that the polynomial identity is 
supplying an underlying structure. For the four-term 
abcd conjecture, Granville conjectures that for each 
£ > 0, all counterexamples to (4) come from at most 
finitely many polynomial families. And the number of 
polynomial families grows to infmity as e shrinks to 0. 

We have looked here at only a small portion of 
the held of Diophantine equations, and then we have 
looked mainly at the dynamic relationship between 
heuristics and computational searches for small solu- 
tions. For much more on the subject of computational 
Diophantine methods, see Smart (1998). 

Heuristic arguments often assume that the objects 
of study behave as if they were random, and we have 
visited several cases where it is useful to think this 
way. Other examples include the twin-prime conjecture 
(there are infinitely many primes p such that p + 2 


is prime), Goldbach’s conjecture (every even number 
larger than 2 is the sum of two primes), and countless 
other conjectures in number theory. Often the compu- 
tational evidence for the probabilistic view is striking, 
even overwhelming, and we become convinced of the 
truth of our model. But on the other hånd, if it is this 
pseudo-proof that is all we have to go on, we may still 
be very far from the truth. Nevertheless, the interplay 
of computations and heuristic thinking forms an indis- 
pensable part of our arsenal, and mathematics is the 
richer for it. 

Remarks and Acknowledgments 

I would like to recommend to the reader the book by 
Cohen (1993) for a discussion of computational alge- 
braic number theory, a subject that is neglected in this 
article. I am grateful to the following people, who gener- 
ously shared their expertise: X. Gourdon, A. Granville, 
A. Odlyzko, E. Schaefer, K. Soundararajan, C. Stewart, 
R. Tijdeman, and M. van Frankenhuijsen. I am also 
thankful to A. Granville and D. Pomerance for help- 
ful suggestions with the exposition. I was supported 
in part by NSF grant DMS-0401422. 

Further Reading 

Agrawal, M., N. Kayal, and N. Saxena. 2004. PRIMES is in P. 

Annals of Mathematics 160:781-93. 

Buhier, J., R. Crandall, R. Ernvall, and T. Metsånkylå. 1993. 
Irregular primes and cyclotomic invariants to four mil- 
lion. Mathematics of Computation 61:151-53. 

Cohen, H. 1993. A Course in Computational Algebraic Num- 
ber Theory. Graduate Texts in Mathematics, volume 138. 
New York: Springer. 

Crandall, R., and C. Pomerance. 2005. Prime Numbers: A 
Computational Perspective, 2nd edn. New York: Springer. 
Darmon, H., and A. Granville. 1995. On the equations z m = 
F(x,y) and Ax p + By * = Cz r . Bulletin of the London 
Mathematical Society 27:513-43. 

Erdos, P., and S. Ulam. 1971. Some probabilistic remarks 
on Fermat's last theorem. Rocky Mountain Journal of 
Mathematics 1:613-16. 

Granville, A., and T. J. Tucker. 2002. It’s as easy as abc. 
Notices of the American Mathematical Society 49:1224- 
31. 

Odlyzko, A. M., and A. Schonhage. 1988. Fast algorithms 
for multiple evaluations of the Riemann zeta function. 
Transactions of the American Mathematical Society 309: 
797-809. 

Poonen, B., E. Schaefer, and M. Stoll. 2007. Twists of X(7) 
and primitive solutions to x 2 +y 3 = z 7 . Duke Mathemat- 
ics Journal 137:103-58. 



IV. 4. Algebraic Geometry 


49 


Rosser, J. B., and L. Schoenfeld. 1962. Approximate formu- 
las for some functions of prime numbers. Illinois Journal 
of Mathematics 6:64-94. 

Smart, N. 1998. The Algorithmic Resolution ofDiophantine 
Equations. London Mathematical Society Student Texts, 
volume 41. Cambridge: Cambridge University Press. 


IV.4 Algebraic Geometry 

Jånos Kollår 


1 Introduction 

Succinctly put, algebraic geometry is the study of geom- 
etry using polynomials and the investigation of polyno- 
mials using geometry. 

Many of us were taught the beginnings of algebraic 
geometry in high school, under the name “analytic 
geometry.” When we say that y = mx + b is the equa- 
tion of a line L, or that x 2 +y 2 = r 2 describes a circle C 
of radius r, we establish a basic connection between 
geometry and algebra. 

If we want to find the points where the line L and 
the circle C intersect, we just substitute mx + b for 
y in the circle equation to get x 2 + ( mx + b) 2 = r 2 
and solve the resulting quadratic equation to obtain the 
x coordinates of the two intersection points. 

This simple example encapsulates the method of 
algebraic geometry: a geometric problem is translated 
into algebra, where it is readily solvable; conversely, we 
get insight into algebra problems by using geometry. 
It is hard to guess the solutions of systems of poly- 
nomial equations, but once a corresponding geometric 
picture is drawn, we start to have a qualitative under- 
standing of them. The precise quantitative answer is 
then provided by algebra. 

2 Polynomials and Their Geometry 

Polynomials are the expressions one can put together 
from variables and numbers by addition and multipli- 

cation. The most familiar are one-variable polynomials 

such as x 3 * - x + 4, but we can use two or three vari- 

ables to get, for distance, 2x 5 * - 3 xy 2 + y 3 (which has 
degree 5 in two variables) or x 5 - y 7 + x 2 z s - xyz + 1 
(which has degree 10 in three variables). In general, one 
can use n variables, in which case they are frequently 
denoted by X\,X2,..., x n , and we write f(x \ , . . . ,x n ), 
f(x) or simply / to denote an unspecified polynomial. 

Polynomials are the only functions that computers 
can work with. (Although your pocket calculator is 



Figure 1 A hyperboloid intersecting a plane. 

likely to have a button for logarithms, it is secretly com- 
puting a polynomial whose value at a number b agrees 
with log b up to many decimal places.) 

We can slightly rewrite the equations we gave earlier 
for the line L and the circle C: as y - mx - b = 0 and 
x 1 2 + y 2 - r 2 = 0. We can then describe L and C as zero 
sets: L is the zero set of y - mx - b (that is, the set of 
all points (x, y ) such that y - mx - b = 0) and C is the 
zero set of x 2 + y 2 - r 2 . 

Similarly, the zero set of 2x 2 + 3y 2 - z 2 - 7 in 3-space 
is a hyperboloid, the zero set of z - x - y in 3-space is a 
plane, and the common zero set of these two equations 
in 3-space is the intersection of the hyperboloid and the 
plane, which is an ellipse (see figure 1). 

The set of common zeros of a system of polyno- 
mial equations in any number of variables is called an 
algebraic set. These are the basic objects of algebraic 
geometry. 

Most people feel that geometry ends in 3-space. Very 
few have a feeling for 4-space, also called space-time, 
and 5-space is by and large inconceivable to almost 
everyone. So what is the meaning of geometry in many 
variables? 

Algebra comes to our rescue here. While I have great 
difficulty visualizing what a four-dimensional sphere of 
radius r in 5-space should be, I can easily write down 
its equation, 

xl + x 2 + x% + xl + x 2 - r 2 = 0, 
and work with it. This equation is also something a 
computer can handle, which is immensely useful in 
applications. 
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I will, nonetheless, stick to two or three variables 
for the rest of this article. This is where all geometry 
starts and there are plenty of interesting questions and 
results. 

The importance of algebraic geometry derives from 
the faet that significant interactions between algebra 
and geometry happen very frequently. Let us look at 
two examples, just for illustration. 

3 Most Shapes Are Algebraic 

Shapes that occur frequently enough to have their own 
name, forinstance, lines, planes, circles, ellipses, hyper- 
bolas, parabolas, hyperboloids, paraboloids, ellipsoids, 
are almost all algebraic. Even the more esoteric con- 
choid (or shell curve) of Durer, the trident of newton 
[VI. 14], and the folium of Kepler are algebraic. 

Some shapes cannot be described by polynomial 
equations, but they can be described by polynomial 
inequalities. For instance, the inequalities 0 ^ x < a 
and 0 < y ^ b together describe a rectangle with side 
lengths a, b. Shapes described by polynomial inequali- 
ties are called semi-algebraic, and every polyhedron is 
semi-algebraic. 

Not everything is an algebraic set, though. Look, for 
example, at the graph of the sine funetion y = sin x. 
This crosses the x-axis infinitely many times (at multi- 
ples of tt). If /(x) is any polynomial, then it has at most 
as many roots as its degree, so y = f{x) will never look 
like y = sinx. 

We can, however, get very close to sinx with a poly- 
nomial if we concentrate on values of x that are not too 
large. For instance, the degree-7 Taylor polynomial 

X - gX 3 + ilo* 5 “ 5< kd x7 

differs from sinx by an error of at most 0.1 for -tt < 
x < tt. This is a very special case of a basic theo- 
rem of Nash that says that every “reasonable” geomet- 
ric shape is algebraic if we ignore what happens very 
far from the origin. So, what is reasonable? Certainly 
not everything. Fractals seem profoundly nonalgebraic. 
The nicest shapes are manifolds [1.3 §6.9], and all of 
these can be described by polynomials. 

Nash’s theorem. Let M be any manifold in M n . Fix any 
large number R. Then there is a polynomial f whose 
zero set is as close to M as we want, at least inside a 
hall of radius R around the origin. 


4 Codes and Finite Geometries 

Consider the equation x 2 + y 2 = z 2 , which describes 
a double cone in 3 -space (see figure 4). If we confine 
ourselves to natural numbers, then the solutions of 
x 2 + y 2 = z 2 are the Pythagorean triples, correspond- 
ing to right-angled triangles where all sides have inte- 
ger lengths, of which the two best-known examples are 
(3,4,5) and (5,12,13). 

Let us now look at the same equation, but declare that 
we care only about the parities of the two sides (that is, 
whether they are even or odd). For instance, 3 2 + 1 5 2 and 
4 2 are both even, so we say that 3 2 + 1 5 2 = 4 2 (mod 2) 
(see modular arithmetic [III.60]). The parities of x 2 + 
y 2 and of z 2 depend only on those of x, y, and z, so 
we can pretend that x, y, and z are all either 0 (the 
even case) or 1 (the odd case). Our equation modulo 2 
therefore has four solutions: 

000, 011, 101, 110. 

These look like code words in a computer mes- 
sage. It was quite a surprise when it was discovered 
that using polynomials and their solutions modulo 2 
is a great— probably the hest— way of constructing 
error-correcting codes (see reliable transmission of 
INFORMATION [VII.6]). 

There is something very substantial and new happen- 
ing here. Let us think for a moment about what 3-space 
is for us. For many it is an amorphous everything, but 
for algebraic geometers (with descartes [VI. 11] as our 
ancestor) it is simply a collection of points described 
by three numbers, the x, y, and z coordinates. Let us 
make a jump here, and declare that “3-space modulo 2” 
is the collection of all “points” given by three coordin- 
ates modulo 2. Four of these are Usted above, and there 
are four more. The beauty of algebra is that suddenly 
we can talk about Unes, planes, spheres, cones in this 
“3-space having only eight points.” 

We do not need to stop here, and one can work mod- 
ulo any integer. For example, working modulo 7, we 
have 0, 1, 2, 3, 4, 5, 6 as possible coordinates, and so 
“3-space modulo 7” has 7 3 = 343 points. 

Talking about geometry in these spaces is very 
intriguing, but also technically difficult. Its great reward 
is that one can view this process as a “discretization” 
of ordinary space. Working modulo n for large n (espe- 
cially when n is a prime number) gets very close to the 
usual geometry. 

This approach is especially fruitful in number-theo- 
retic questions. It was, for instance, instrumental in 
Wiles’s proof of Fermat’s last theorem. 
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For more on these topics, see arithmetic geometry 
PV.51, 

5 Snapshots of Polynomials 

Consider the equation x 2 + y 2 = R. If R > 0, then the 
real solutions form a circle of radius VR m , if R = 0, we 
get only the origin; and if R < 0, we get the empty set. 
Thus, if R > 0, then the geometry of the solution set 
determines what R is, but otherwise it does not. We 
can of course look at complex solutions, and the com- 
plex solutions always determine R. (For instance, the 
intersection points with the x-axis are (±y'R, 0).) 

If R is a rational number, we can ask about rational 
solutions of x 2 + y 2 = R, and if R is an integer, we 
can also look for solutions in the “plane modulo m” 
for any m. 

One can even look for solutions where % = x(t), 
y = y(t ) are themselves polynomials in a variable t. 
(Most generally, we can ask for solutions where x, y 
are elements of any ring containing the number R.) 

To my mind, the polynomial is the central object, and 
each time we look at solution sets we are taking a “snap- 
shot” of the polynomial. Some snapshots are good (like 
the above real snapshot for R > 0) and some are bad 
(like the above real snapshot for R < 0). 

How good can snapshots be? Can we determine a 
polynomial from its snapshots? 

One frequently talks about “the” equation of a hyper- 
bola, but “an” equation would be more correct. Indeed, 
the hyperbola x 2 - y 2 - R = 0 can also be given by 
an equation cx 2 - cy 2 - cR = 0, for any c =t= 0. We can 
also use the equation (x 2 -y 2 -R) 2 = 0, which we may 
well not recognize in its expanded form. Fligher powers 
can also be used. What about the equation f(x,y) = 
(x 2 - y 2 - R)(x 2 + y 2 + R 2 ) = 0? If we look only 
at real solutions, this is still just the hyperbola since 
x 2 + y 2 + R 2 is always positive for x, y real. However, 
as with one-variable polynomials, one should look at all 
complex roots to understand everything. Then we see 
that f(V~lR, 0) = 0, but the complex point (V-1R, 0) 
is not on the hyperbola x 2 - y 2 - R = 0. In general, 
as long as R * 0, we get that if f(x,y) is a polynomial 
that has exactly the same complex roots as x 2 -y 2 -R, 
then / = c(x 2 - y 2 - R) m for some m and c * 0. 

Why is the R = 0 case different? The reason is that 
for R * 0 the polynomial x 2 -y 2 -R is irreducible (that 
is, it cannot be written as the product of other polyno- 
mials), while x 2 - y 2 = (x + y)(x - y) is reducible 


with irreducible factors x + y and x - y. In the lat- 
ter case one gets that if g(x,y) is a polynomial that 
has exactly the same complex roots as x 2 - y 2 , then 
/ = c ■ (x + y) m (x - y) n for some m, n and c =t= 0. 

The analogous question for systems of equations 
is answered by the fundamental theorem of algebraic 
geometry. It is sometimes called Hilbert’s theorem on 
the zeros, but its German name is used most of the 
time. For simplicity, we State only the case of one 
equation. 

Hilbert’s Nullstellensatz. Two complex polynomials f 
and g have the same complex solutions if and only if 
theyhave the same irreducible factors. 

We can do even better for polynomials with integer 
coefficients. For instance, x 2 - y 2 - 1 = 0 and 2(x 2 - 
y 2 - 1) =0 have the same solutions over the real or 
complex numbers, and the same solutions modulo p 
for any odd prime p, but they have different solutions 
modulo 2. The general result in this case is easy and 
simple. 

Arithmetic Nullstellensatz. Two polynomials with in- 
teger coefficients f and g have the same solutions 
modulo m for everym if and onlyiff = ±g. 

6 Bézout’s Theorem and Intersection Theory 

If h(x) is a polynomial of degree n, then it has n 
complex roots, at least when they are counted with 
multiplicity. What happens with a system f{x,y) = 
g(x,y) = 0? Geometrically we see two curves in the 
plane, so we expect that there will typically be fmitely 
many intersection points. 

If /, g are both linear, we have two lines in the plane. 
These usually intersect in a single point, but they can 
be parallel and they can coincide. The first case leads 
to the classical declaration that “parallel lines meet at 
infinity” and the definition of projective planes and 
projective spaces [III.74]. (The introduction of projec- 
tive spaces and the corresponding projective varieties 
is a key step in algebraic geometry. It is somewhat tech- 
nical so we shall skip it here, but it is indispensable even 
at the most basic level.) 

Next, consider two polynomials of degree 2, that is, 
two plane conics. Two smooth conics usually intersect 
in at most four points (just try this by drawing two 
ellipses). There are also some rather degenerate cases. 
Two conics may coincide, or, if they are both reducible, 
they can have a common line. In any case, we are ready 
to formulate a basic result, dating back to 1779. 
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Bézout’s theorem. Let f\(x), . . . ,f n (x) be n polyno- 
mials in n variables, and for each i let di be the degree 
of f i. Then either 

(i) the equation(s) f\ (x) = ■ ■ ■ = f n (x) = 0 have at 
most di d'2 ■ ■ • d n solutions; or 
(11) the fi vanish identically on an algebraic curve C, 
and so there is a continuous family of solutions. 

As an example, the second alternative happens for 
the system of equations xz - y 2 = y 3 - z 2 = x 3 - 
z = 0, which has (t,t 2 ,t 3 ) as a solution for any t. This 
case is actually quite rare. If we pick the coefficients of 
the polynomials fi randomly, then the first alternative 
happens with probability 1. 

Ideally, we would like to make the stronger claim that 
if the first alternative happens, then there are exactly 
d-\d'2 ■ ■ ■ d n solutions, but counted “with multiplicity.” 
This actually works, and gives us our first example of 
an extremely useful feature of algebraic geometry. Even 
in very degenerate situations it is possible to define 
and count the multiplicities easily. This is frequently 
of great help since the typical (or “generic”) cases are 
usually very hard to compute. To get around this prob- 
lem, we can sometimes find a special, degenerate case 
where we know that the answer will be the same, but 
the computations are much easier. 

There are two ways to think about multiplicity: one 
algebraic and one geometric. The algebraic definition 
is computationally very efficient, but somewhat techni- 
cal. The geometric interpretation is easier to explain, so 
that is the one we shall give here, but it would be hard 
to compute with in practice. 

If x = p is an isolated solution of the equations 
/i (x) = ■ ■ ■ = f n (x) = 0 with multiplicity m, then 
the perturbed system 

/l(x) + Cl = ■ ■ ■ = fn(x) + e n = 0 

has exactly m solutions near x = p for almost all small 
values of the 

Intersection theory is the branch of algebraic geom- 
etry that deals with generalizations of Bézout's the- 
orem. Above, we looked at intersections of hypersur- 
faces— that is, of zero sets of single polynomials— but 
we may wish to look at intersections of more general 
algebraic sets. Also, even when the second alternative 
holds, we may want to count the number of isolated 
intersection points; this can he very tricky but also very 
useful. 



7 Varieties, Schemes, Orbifolds, and Stacks 

Consider the system xz = yz = 0 in 3-space. It consists 
of two pieces, the z = 0 plane and the x = y = 0 
line. It is easy to see that neither the plane nor the line 
can he written as the union of algebraic sets (except by 
nitpickers who point out that the line is the union of 
the line itself and of any point on the line). In general, 
any algebraic set can he written in exactly one way as 
the union of smaller algebraic sets that in turn cannot 
be decomposed further. These basic budding blocks are 
called irreducible algebraic sets or algebraic varieties. 

Sometimes this is not exactly what one would naively 
expect. For instance, the curve in figure 2 has two con- 
nected components. The two parts are, however, not 
algebraic sets. 

An explanation is provided by looking at the com- 
plex solutions of this equation. We shall see later that 
these form a connected set, namely a torus (with a miss- 
ing point at infinity). We see two components when 
we look at the real solutions hecause we are taking a 
cross-section of this torus. 

In general, the zero set f = 0 is irreducible as an alge- 
braic set if and only if / is irreducible as a polynomial 
(or if it is the power of an irreducible polynomial). The 
implication in one direction is easy to see: if / = gh, 
then the zero set of / is the union of the zero set of g 
and of the zero set of h. 

For many questions, keeping track only of the zero 
set is not enough. For instance, look at the polynomial 
/ = x 2 (x - l)(x - 2) 3 . It has degree 6 and three roots 
at x = 0, 1, 2. These roots behave differendy, however, 
and one usually says that / has a double root at x = 0 
and a triple root at x = 2. If we perturb / by adding 
a small number c to it, then the perturbed equation 
fix) + e = 0 has two (complex) solutions near 0, one 
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solution near 1 and three (complex) solutions near 2. 
Thus, these multiplicities carry important geometric 
meaning about the perturbation of the equation. 

Similarly, it is natural to say that while x 2 y = 0 and 
xy 3 = 0 define the same algebraic set (consisting of the 
two axes), the first “assigns multiplicity 2” to the y-axis 
and the other “assigns multiplicity 3” to the x-axis. 

More complicated things can happen for systems of 
equations. Consider the systems x = y 2 = 0 and 
x 3 = y = 0 in 3-space. Both define the z-axis and it is 
reasonable to say that the first does so with multiplic- 
ity 2, the second with multiplicity 3. There is, however, 
a further difference. In the first case the multiplicity 
seems to “go in the y-direction” and in the second case 
it seems to go in the x-direction. We can also look at 
other systems, like x - cy = y 3 = 0, if we want to see 
more complicated behavior. 

Roughly speaking, a scheme is an algebraic set where 
we also keep track of the multiplicities and of the 
directions they occur in. 

Consider the xy-plane and consider the map that 
reflects across the origin. Thus a point (x,y ) is mapped 
to (— x, -y). Let us try to glue each point (x,y) to its 
image (-x, -y). What do we get? The right half-plane 
x ^ 0 is mapped to the left half-plane x < 0, so it is 
enough to work out what happens with the right half- 
plane. The positive j'-axis is glued to the negative y- 
axis, and the resulting surface is a dunce cap (but less 
pointy). 

Algebraically, it is one half of the cone z 2 = x 2 + y 2 . 
This cone looks nice and smooth except at the ver- 
tex. There it is more complicated, but the above con- 
struction shows that it can be obtained from a plane 
by a reflection across a point. More generally, suppose 
we take the n-dimensional space R n and finitely many 
symmetries of it. If we glue together points that move 
into each other, we again get an algebraic variety, most 
of whose points are smooth, but some of which are 
more complicated. A variety made up of pieces like 
these is called an orbifold. (When this is defined more 
precisely, we also keep track of which symmetries have 
been used.) In practice, such varieties occur frequently; 
that is why they deserve a separate name. 

Finally, if we marry a scheme to an orbifold, the out- 
come is a stack. The study of stacks is strongly recom- 
mended to people who would have been flagellants in 
earlier times. 


8 Curves, Surfaces, Threefolds 

As with any geometric object, one of the simplest ques- 
tions one can ask about a variety is: what is its dimen- 
sion? As expected, a curve in the plane has dimen- 
sion 1, and a surface in 3-space has dimension 2. This 
seems quite simple until one writes down examples like 
S = (x 4 + y 4 + z 4 = 0), which is only the origin in R 3 . 
This example is, nonetheless, still two dimensional: the 
explanation is that we were looking at the wrong snap- 
shot. Using complex numbers we can solve the equa- 
tion as z = y/ — x 4 - y 4 , so the complex solutions of 
x 4 + y 4 + z 4 = 0 can be described by two indepen- 
dent variables x, y and a dependent variable z. Thus, 
it is quite reasonable to say that S is two dimensional. 

This idea works more generally. If X is any variety in 
some complex space C n , then choose a random set of 
n independent directions to serve as a basis, or coor- 
dinate system, for C”, and hence for X. With proba- 
bility 1 (i.e., except in degenerate cases) one finds that 
there is some d such that the first d coordinates of 
a point x in X can vary independently, while the rest 
depend on them. This number d depends on X only and 
is called the dimension (or, to be precise, the algebraic 
dimension) of X. 

If X is a variety and / is a polynomial, then the inter- 
section In(/=0) has dimension one less than dim X 
(unless / vanishes identically on X or never takes the 
value zero on X). 

If X is a subset of R™ defined by real equations, 
and if it is smooth (see the next section for a discus- 
sion of smoothness), then its topological dimension 
(see dimension [III. 17]) is the same as its algebraic 
dimension. 

For complex varieties, the topological dimension is 
twice the algebraic dimension. Thus, for an algebraic 
geometer, C n has dimension n. In particular, for us 
C is the “complex line,” whereas everybody else calls 
this the “complex plane.” Our “complex plane” is, of 
course, C 2 . 

A variety of dimension 1 is called a curve. A surface 
is a variety of dimension 2, and a threefold is a variety 
of dimension 3. 

The theory of algebraic curves is a very well devel- 
oped and beautiful subject. We shall see later how one 
can start to get an overview of all algebraic curves. Sur- 
faces have been intensively studied for the last century, 
and now we have reached a reasonably complete under- 
standing of them. This is a much more complicated 
theory than for curves. Still very little is known for 



54 


IV. Branches of Mathematics 



Figure 3 Singular cubics: (a) y 2 = x 3 + x 2 and (b) y 2 = x 3 . 

varieties of dimension 3 and up. At least conjecturally, 
all these dimensions behave in roughly the same way. 
Despite some progress, especially in dimension 3, many 
questions are wide open. 

9 Singularities and Their Resolutions 

If we look at the simplest examples of algebraic curves 
in figure 3, we see that most points of a curve are 
smooth, but that there may be a finite set of more com- 
plicated singular points. Let us compare these with the 
curve in figure 2. 

All three curves pass through the origin, since their 
equation has no constant term. The equation of figure 2 
has a linear term and the curve looks nice and smooth 
at the origin, whereas the equations of figure 3 contain 
no linear term and the curves are more complicated at 
the origin. This is not an accident. For small values of x, 
the higher powers x 2 ,x 3 ,... are much smaller than x 
in absolute value, so near the origin the linear terms 
dominate. If we have only linear terms ax + by = 0, 
we get a line through the origin, and an algebraic curve 
ax + by + cx 2 + gxy + ey 2 + ■ ■ ■ = 0 is close to the line 
ax + by = 0, at least for very small values of x and y. 

The study of a curve near another point with coordin- 
ates ( p , q) canbe reduced to the case (p, q) = (0, 0) via 
the coordinate change (x,y) >— (x — p,y - q). 


In general, if /(O) = 0 and / has a (nonzero) lin- 
ear term !(/), the hypersurface / = 0 is very close to 
the hyperplane L(f) = 0. This is the so-called implicit 
function theorem. Such points are called smooth. Points 
that are not smooth are called singular. One can easily 
show that the singular points of X form an algebraic 
set, defined by the vanishing of all partial derivatives 
df /dxi. A random hypersurface will, with probability 1, 
be smooth, but there are many singular hypersurfaces 
as well. 

The smooth and singular points of an arbitrary vari- 
ety of dimension d can be defined analogously by 
comparing X with d-dimensional linear subspaces. 

Singularities also occur in other geometric helds, 
such as topology and differential geometry, but by and 
large these helds shy away from their study (with the 
notable exception of catastrophe theory). By contrast, 
algebraic geometry provides very powerful tools for 
their investigation. 

Let us start with singularities of hypersurfaces, or 
equivalently with critical points of functions. When 
thinking about these it is natural to work not just with 
polynomials but with more general power series, that 
is, functions /(x i, . . . , x n ) that canbe written as “poly- 
nomials of inhnite degree.” For simplicity of notation 
we shall assume that /(0) = 0. Two functions /, g 
are considered to be equivalent if there is a coordinate 
change x* •- 4>i (x), where each 4>i is given by a power 
series, such that /(</>i(x),...,</> n (x)) = g(x). 

In the one-variable case, any / can be written as 
/ = x m (a m + a m+ ix +■■■), 
where a m =t= 0. The (inverse of the) substitution 

then shows that / is equivalent to x m . The functions 
x m are inequivalent for different values of m, so in this 
particular case the lowest-degree monomial occurring 
in / determines / up to equivalence. (Note that even if 
/ is a polynomial, the above change of variable involves 
an inlinite power series: it is because we cannot hivert 
polynomials, even locahy, that it is more convenient to 
consider general power series.) 

In general, the lowest-degree terms of a power series 
do not determine the singularity, but taking more terms 
is usually enough to do so, because of the following 
result. 

Algebraization of analytic singularities. Given a power 
series f, let denote the polynomial obtained from 

f by deleting all monomials of degree greater than N. 
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If O is an isolated singular point of the hypersurface 
(/ = 0), then f is equivalent to fe jv for sufficiently 
large N. 

To see an example of a nonisolated singularity at 0, 
take 

g(x,y,z) = + Yz^) ~ z * 

= (y + x + x 2 +x fe... ) 2_ zi 

It has singular points not just at 0, but everywhere 
along the curve y + (x/(l - x)) = z = 0. On the other 
hånd, one can easily check that all truncations g^N do 
have an isolated singular point at 0. 

If we have two power series, / and g, we can view 
functions of the form f + eg as perturbations of /. 
A very fruitful question of singularity theory asks: 
what can we say about the perturbations of a given 
polynomial or power series /? 

For instance, in the one-variable case, the polynomial 
x m can be perturbed as x m + ex r , which is equivalent 
to x r if r < m. Every perturbation contains x m , so 
if r > m, then no perturbation of x m will be equiv- 
alent to x r (because near the origin x m will be much 
larger than x r ). Hence, up to equivalence, the set of all 
possible perturbations of x m is {x r : r < m}. 

On the other hånd, it is not hard to see that for any 
given e, there are only twenty-four different values of q 
for which the polynomials xy (x 2 -y 2 ) + ey 2 (x 2 - y 2 ) 
and xy (x 2 -y 2 ) + qy 2 (x 2 -y 2 ) are equivalent. (Indeed, 
both polynomials describe four lines through the ori- 
gin. The first one gives the lines y = 0, x = y, x = -y, 
and x = -ey, and the second gives the same lines 
except that q replaces e. The linear part of any sup- 
posed equivalence gives a linear transformation map- 
ping the first set of four lines to the second. There are 
twenty-four ways to assign which line goes to which 
line.) Thus xy(x 2 - y 2 ) has a continuous family of 
inequivalent perturbations. 

Simple singularities. Suppose that the polynomial 
or power series f(x i, . . . , x n ) has only hnitely many 
inequivalent perturbations. Then f is equivalent to one 
of the following normal forms: 

Am x™ +1 + x 2 + ■ ■ ■ + x£ (m> 1), 

D m x\xi +xf _1 +x\ + ■ ■ ■ + x 2 {m ^ 4), 

E 6 x\ + x| + x\ + ■ ■ ■ + X 2 , 

E 7 x\ + Xi x\ + x\ + ■ ■ ■ + X 2 , 

E 8 x\ + xf + x\ + ■ ■ ■ + X 2 . 



Figure 4 A resolution of the cone. 


The names should bring to mind the classification 
of lie groups [III. 50]. The connections are numerous 
but not easy to explain. When n = 3, these are also 
called Du Val singularities or rational double points. 

Consider again the cone z 2 = x 2 + y 2 . Earlier, we 
described a two-to-one parametrization of it. Here is 
another, and for many purposes better, parametriza- 
tion over the real numbers. 

In the (u,v,w ) -space consider the smooth cylinder 
u 2 + v 2 = 1. The map ( u,v,w ) •- {uw,vw,w) maps 
the cylinder onto the cone (see figure 4). The map is one- 
to-one away from the vertex, the preimage of which is 
the circle u 2 + v 2 = 1 in the (w = 0)-plane. 

(Sharp-eyed readers will have noticed that this map 
is not so nice if we use complex numbers. In general, 
we want parametrizations that work both for real and 
complex numbers, but that would be quite a bit more 
complicated to describe.) 

The advantage of the cylinder over the cone is that 
it does not have a singularity. Parametrizations of vari- 
eties in terms of smooth varieties are very useful, and 
there is a major result that tells us that they always 
exist, at least when the varieties are real or complex. 
(The corresponding result is still unknown for the finite 
geometries considered earlier.) 

Resolution of singularities (Hironaka). For any variety 
X there is another smooth variety Y and a polynomi- 
aUy defined surjective map tt : Y — X such that tt is 
invertible at all smooth points of X. 


56 


IV. Branches of Mathematics 


(In the cone example above, one can take the whole 
cylinder, but the cylinder minus finitely many points in 
the collapsed circle would also work. In order to avoid 
such silly cases, we require tt to be surjective in a very 
strong sense: if a sequence of smooth points Xj e X 
converges to a limit in X, then a subsequence of their 
preimages tt~' (xj) converges to a limit in Y.) 

10 Classification of Curves 

In order to get an idea of how the classification of alge- 
braic varieties should proceed, let us look at hyper- 
surfaces of degree d in n-space. These are given by 
a degree-d polynomial f(x \ , . . . , x n ) = 0. The set of 
all polynomials of degree at most d forms a vector 
space V n ,d- Thus hypersurfaces have two obvious dis- 
crete invariants, the dimension and the degree, and one 
can move between hypersurfaces of the same dimen- 
sion and degree by varying the coefficients of / continu- 
ously. Moreover, the entire set V n ,d is itself an algebraic 
variety. Our aim is to develop a similar understanding 
for all varieties, which can be done in two steps. 

The first step is to define some integers, naturally 
attached to varieties, which stay the same if we change 
a variety continuously. Such integers are called discrete 
invariants. The simplest example is the dimension. 

The second is to show that the set of all varieties 
with the same discrete invariant is parametrized by 
another algebraic variety, called the moduli space 
[IV.8]. Moreover, we would like the variety used for this 
parametrization to be chosen as economically as pos- 
sible. We will look at this in more detail in the next 
section. 

Let us see how it is accomplished for curves. Here 
there is only one more discrete invariant besides the 
dimension, known as the genus of the curve. This 
has many different definitions: one of the simplest is 
through topology. Let E be a smooth curve and let us 
look at its complex points. Locally, this set looks like C, 
so it is a topological surface. After patching up some 
holes at infmity, we get a compact surface. Multiplica- 
tion by -/-l gives an orientation, so basic topology tells 
us that we get a sphere with a certain number of han- 
dles attached (see differential topology [IV. 7]). The 
genus of the curve is defined to be the number of these 
handles (that is, the genus of the corresponding sur- 
face). To see what this means in practice, let us look at 
some examples. 

A line in 2-space is like the complex numbers, which 
can be viewed as a sphere minus a point. This sphere, 


C plus the point at infinity, is also called the Riemann 
sphere. So the genus is zero. 

Next, we look at conics. Here it is better to use some 
projective geometry. Take any tangent of the conic and 
move this so that it becomes the line at infmity. Then we 
get a parabola, which, in suitable coordinates, is given 
by an equation y = x 2 . The polynomial map t ■— (t, t 2 ), 
with its inverse (x,y) ■- x, shows that this parabola is 
isomorphic to a line, so again has genus 0. 

Cubics are quite a bit more complicated. A first warn- 
ing is that y = x 3 is the wrong cubic to look at. It is 
smooth (and has genus 0) but it is singular at infmity. 
(The earlier expediency of keeping silent about projec- 
tive geometry starts to bite us!) In any case, the cor- 
rect thing to do is to choose the tangent line of the 
cubic at an inflection point and move that to infinity. 
After some computation we obtain a much-simplified 
equation y 2 = f(x), where / has degree 3. What is the 
genus? 

Consider the special case y 2 = x(x - l)(x - 2). 
We try to understand the two-to-one projection to the 
(complex) x-axis, but it is better to do this when the 
x-axis has already had the point at infinity added, so 
that it is the Riemann sphere. If we remove the interval 
0 < x ^ 1 and the half line 2 < x ^ +oo from the Rie- 
mann sphere, then the function y=r^x(x - 11 (x - 2} 
has two branches. (This means that y takes two differ- 
ent values for each x, the positive and negative square 
roots of x (x - 1 ) (x - 2 ) , but if one moves x about, one 
can let y vary in a continuous way.) The sphere minus 
two slits is topologically like a cylinder, hence the com- 
plex cubic is glued together from two cylinders. So we 
get the torus and the genus is 1. 

It turns out that a smooth plane curve of degree d 
has genus |(d - 1 )(d - 2), but I find this hard to see 
directly topologically. 

It is a (probably hopeless) dream of algebraic geome- 
ters to give a similarly simple description of the 
discrete invariants for higher-dimensional varieties. 
Unfortunately, the topological invariants of the com- 
plex points are not good enough, and they probably 
mislead more than help. 

As a further illustration of the approach to the clas- 
sification of curves, here is a list of all curves of low 
genus. 

Genus 0. There is only one curve of genus 0. As we 
saw, it can be realized as a line or as a conic in the 
plane. 
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Genus 1. Every curve of genus 1 is a plane cubic, and 
it can be given by an equation of the form y 2 = 
/(%), where / has degree 3. Genus-1 curves are usu- 
ally called elliptic curves [III.21], since they first 
appeared (in the guise of elliptic integrals) in connec- 
tion with the arc length of ellipses. We look at these 
in more detail later. 

Genus 2. Every curve of genus 2 can be given by an 
equation of the form y 2 = f(x ), where f has 
degree 5. (These curves are singular at infinity.) More 
generally, if f has degree 2g + 1 or 2g + 2, then the 
curve y 2 = fix) has genus g. For g^'i, such curves, 
called hyperelliptic, are rather special. 

Genus 3. Every curve of genus 3 can be realized as a 
plane curve of degree 4 (or it is hyperelliptic). 

Genus 4. Every curve of genus 4 can be presented as 
a space curve given by two equations of degrees 2 
and 3 (or it is hyperelliptic). 

It should be emphasized that hyperelliptic curves do 
not form a separate family. One can move continuously 
from any hyperelliptic curve to a general curve of the 
kind described above. This can be seen through more- 
complicated representations. 

One can continue in this manner a bit longer, up to 
about genus 10, but no such explicit construction is 
possible when the genus is large. 

1 1 Moduli Spaces 

Let us go back to plane cubics, which we parametrized 
by the vector space V23 of degree-3 polynomials in 
two variables. This is not very economical. For instance, 
x 3 + 2 y 3 + 1 and 3x 3 + 6 y 3 + 3 are different polyno- 
mials, but define the same curve. Furthermore, there 
is not much reason to distinguish x 3 + 2 y 3 + 1 from 
2x 3 + y 3 + 1, since they are obtained from each other 
by switching the two coordinate axes. More generally, 
as we have seen in the previous section, any cubic 
curve can be transformed into one given by an equation 
y 2 = /(x), where / = ax 3 + bx 2 +cx + d. 

This is better but not yet optimal, and there are 
two more steps to take. First, one can set the leading 
coefficient of / to be 1. Indeed, substitute y = sfciy\ 
and then divide the whole equation by a to get y f = 
x 3 + ■ ■ ■ . Second, we can make a substitution x = 
uxi + v to get another elliptic curve with equation 
y 2 = f(ux 1 + v) = fi(xi), where fi is easy to write 
down explicitly. One can see that these are the only 
coordinate changes that we can make without messing 
up the form y 2 = (cubic polynomial). 


It is still not very clear what happens. To get a better 
answer, look at the three roots of /, so f(x) = (x — 
r\)(x-r2)(x-r-i). (Again, complex numbers inevitably 
appear.) If we make the substitution x — (r2 - n)x + 
ri, we get a new polynomial f\ (x), two of whose roots 
are 0 and 1. Thus our elliptic curve is transformed into 
y 2 = x(x - l)(x - A). So instead of the four unknown 
coefficients of /, we are down to only one unknown, A. 

This form is still not completely unique. In our trans- 
formation we sent n , r2 to 0, 1 , but we could have used 
any two roots. For instance, we can substitute x — 1-x, 
sending A ■- 1 - A, or x •- Ax, sending A -* A -1 . All 
together, the six values 


give “the same” elliptic curve. Most of the time these 
six values are different, but there may be coincidences. 
For instance, we get only three different values if 
A = -1. This corresponds to the faet that the ellip- 
tic curve y 2 = x(x — l)(x + 1) has four symme- 
tries: ( x,y ) -» (-x, ±y^I y) and ( x,y ) — ( x,±y ). 
(An unusual feature of elliptic curves is that they all 
have the second pair of symmetries. At A = 1 we pick 
up 4/2 new symmetries, which corresponds to halving 
the number of different values above.) 

The best way to think about it is to view this as 
an action of the symmetric group S3 (the group of 
permutations of a three-element set) on the set C | 
{0,1}. 

It is not at all obvious that we have run out of tricks, 
but we have in faet reached the final result. 

Moduli of elliptic curves. The set of all elliptic curves 
is in a natural one-to-one correspondence with the 
points of the quotient orbifold (C \ {0, ID/S3. The orb- 
ifold points correspond to the elliptic curves with extra 
automorphisms. 

This is the simplest illustration of a general phe- 
nomenon. 

Moduli principle. In most cases of interest, the set of 
all algebraic varieties with Gxed discrete invariants is 
in a natural one-to-one correspondence with the points 
of an orbifold. The orbifold points correspond to the 
varieties with extra automorphisms. 

The moduli orbifold (also called the moduli space) of 
smooth curves of genus g is denoted by M g . These are 
among the most intensely studied orbifolds in algebraic 
geometry, especially since the recent discovery of their 
fundamental position in string theory [IV.17 §2] and 
MIRROR SYMMETRY [IV. 16]. 
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12 Effective Nulist ellensatz 

In order to show that there are still interesting ele- 
mentary questions in algebraic geometry, let us try to 
decide when m given polynomials f\,...,f m have no 
common complex zero. The classical answer is given 
by the following result, which tells us that an obviously 
necessary condition is in faet sufficient. 

Weak Nullstellensatz. The polynomials fi,...,f m 
have no common complex zero if and only if there are 
polynomials gi,..., g m such that 

gift + ■■■ +g m fm = 1. 

Let us now make a guess that we can find gj with 
degree at most 100. We can then write 

dj = S 

ii + -..+i„<100 

where the ctj^ i n are indeterminates. If we write 

gift + ■ ■ ■ + g m fm as a polynomial in the variables 
xi,...,x n , then all the coefficients must vanish, save 
the constant term which must equal 1. Thus we get 
a system of linear equations in the indeterminates 

aji, i n . The solvability of systems of linear equations 

is well-known (with good computer implementations). 
Thus we can decide if there is a solution with deg^ ; ^ 
100. Of course it is possible that 100 was too small 
a guess, and we may have to repeat the process with 
larger and larger degree bounds. Will this ever end? 
The answer is given by the following result, which was 
proved only recently. 

Effective Nullstellensatz. Let f\, . . . , f m be polyno- 
mials of degree less than or equal to d in n variables, 
where d ^ 3, n ^ 2. If they have no common zero, 
then gifi + ---+ g m f m = 1 has a solution such that 
deg gj C d n - d. 

For most systems, one can find solutions with 
deg di ^ (n - 1 ) (d — 1 ) , but in general the upper bound 
d n - d cannot be improved. 

As explained above, this provides a computational 
method for deciding whether or not a system of polyno- 
mial equations has a common solution. Unfortunately, 
this is rather useless in practice as we end up with 
exceedingly large linear systems. We still do not have a 
computationally effective and foolproof method. 

13 So, What Is Algebraic Geometry? 

To me algebraic geometry is a belief in the unity of 
geometry and algebra. The most exciting and profound 


developments arise from the discovery of new connec- 
tions. We have seen hints of some of these; many more 
were left unmentioned. Born with Cartesian coordin- 
ates, algebraic geometry is now intertwined with cod- 
ing theory, number theory, computer-aided geometric 
design, and theoretical physics. Several of these con- 
nections have emerged in the last decade, and I hope 
to see many more in the future. 

Further Reading 

Most of the algebraic geometry literature is very tech- 
nical. A notable exception is Plane Algebraic Curves 
(Birkhåuser, Boston, MA, 1986), by E. Brieskorn and 
H. Knorrer, which starts with a long overview of alge- 
braic curves through arts and Sciences since antiquity, 
with many nice pictures and reproductions. A Scrap- 
book of Complex Curve Theory (American Mathemat- 
ical Society, Providence, RI, 2003), by C. H. Clemens, 
and Complex Algebraic Curves (Cambridge University 
Press, Cambridge, 1992), by F. Kirwan, also start at an 
easily accessible level, but then delve more quickly into 
advanced subjects. 

The hest introduction to the techniques of algebraic 
geometry is Undergraduate Algebraic Geometry (Cam- 
bridge University Press, Cambridge, 1988), by M. Reid. 
For those wishing for a general overview, An Invitation 
to Algebraic Geometry (Springer, New York, 2000), by 
K. E. Smith, L. Kahanpåå, P. Kekalåinen, and W. Traves, is 
a good choice, while Algebraic Geometry (Springer, New 
York, 1995), by J. Harris, and Basic Algebraic Geometry, 
volumes I and II (Springer, New York, 1994), by I. R. 
Shafarevich, are suitable for more systematic readings. 


IV. 5 Arithmetic Geometry 

Jordan S. Ellenberg 


1 Diophantine Problems, Alone and in Teams 

Our goal is to sketch some of the essential ideas of 
arithmetic geometry; we begin with a problem which, 
on the face of it, involves no geometry and only a bit of 
arithmetic. 

Problem. Show that the equation 

x 2 + y 2 = 7z 2 (1) 

has no solution in nonzero rational numbers x, y, z. 

(Note that it is only in the coefficient 7 that (1) differs 
from the Pythagorean equation x 2 + y 2 = z 2 , which 
we know has infinitely many solutions. It is a feature of 
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arithmetic geometry that modest changes of this kind 
can have drastic effects!) 

Solution. Suppose x, y, z are rational numbers satis- 
fying (1); we will derive from this a contradiction. 

If n is the least common denominator of x, y, z, we 
can write 

x = a/n, y = b/n, z = c/n 
such that a, b, c, and n are integers. Our original 
equation (1) now becomes 



and multiplying through by n 2 one has 

a 2 + b 2 = 7c 2 . (2) 

If a, b, and c have a common factor m, then we can 
replace themby a/m, b/m, and c/m, and (2) still holds 
for these new numbers. We may therefore suppose that 
a, b, and c are integers with no common factor. 

We now reduce the above equation modulo 7 (see 
modular arithmetic [III.60]). Denote by å and b the 
reductions of a and b modulo 7. The right-hand side of 

(2) is a multiple of 7, so it reduces to 0. We are left with 

å 2 + b 2 = 0. (3) 

Now there are only seven possibilities for å, and seven 
possibilities for b. So the analysis of the solutions of 

(3) amounts to checking the forty-nine choices of å, b 
and seeing which ones satisfy the equation. A few min- 
utes of calculation are enough to convince us that (3) is 
satisfied only if å = b = 0. 

But saying that å = b = 0 is the same as saying that 
a and b are both multiples of 7. This being the case, 
a 2 and b 2 are both multiples of 49. It follows that their 
sum, 7c 2 , is a multiple of 49 as well. Therefore, c 2 is 
a multiple of 7, and this implies that c itself is a mul- 
tiple of 7. In particular, a, b, and c share a common 
factor of 7. We have now arrived at the desired contra- 
diction, since we chose a, b, and c to have no common 
factor. Thus, the hypothesized solution leads us to a 
contradiction, so we are forced to conclude that there 
is not, in faet, any solution to (1) consisting of nonzero 
rational numbers. 1 

In general, the determination of rational solutions to 
a polynomial equation like (2) is called a Diophantine 
problem. We were able to dispose of (2) in a paragraph, 


1. Exercise: why does our argument not obtain a contradiction from 

the solution x = y = z = 0? 


but that turns out to be the exception: in general, Dio- 
phantine problems can be extraordinarily difficult. For 
instance, we might modify the exponents in (2) and 
consider the equation 

x 5 +y s = 7z 5 . (4) 

I do not know whether (4) has any solutions in nonzero 
rational numbers or not; one can be sure, though, that 
determining the answer would be a substantial piece 
of work, and it is quite possible that the most powerful 
techniques available to us are insufficient to answer this 
simple question. 

More generally, one can take an arbitrary commuta- 
tive ring [III.83] R, and ask whether a certain polyno- 
mial equation has solutions in R. For instance, does 
(2) have a solution with x, y, z in the polynomial 
ring C[t]? (The answer is yes. We leave it as an exercise 
to find some solutions.) We call the problem of solving 
a polynomial equation over R a Diophantine problem 
over R. The subject of arithmetic geometry has no pre- 
cise boundary, but to a first approximation one may say 
that it concerns the solution of Diophantine problems 
over subrings of number fields [III.65]. (To be honest, 
a problem is usually called Diophantine only when R is 
a subring of a number held. However, the more general 
definition suits our current purposes.) 

With any particular equation like (2), one can asso- 
ciate infinitely many Diophantine problems, one for 
each commutative ring R. A central insight — in some 
sense the basic insight— of modern algebraic geometry 
is that this whole gigantic ensemble of problems can 
be treated as a single entity. This widening of scope 
reveals structure that is invisible if we consider each 
problem on its own. The aggregate we make of all these 
Diophantine problems is called a scheme. We will return 
to schemes later, and will try, without giving precise 
definitions, to convey some sense of what is meant by 
this not very suggestive term. 

A word of apology: I will give only the barest sketch 
of the immense progress that has taken place in arith- 
metic geometry in recent decades — there is simply too 
mueh to cover in an article of the present scope. I have 
chosen instead to discuss at some length the idea of 
a scheme, assuming, I hope, minimal technical know- 
ledge on the part of the reader. In the final section, 
I shall discuss some outstanding problems in arith- 
metic geometry with the help of the ideas developed 
in the body of the article. It must be conceded that the 
theory of schemes, developed by Grothendieck and his 
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collaborators in the 1960s, belongs to algebraic geom- 
etry as a whole, and not to arithmetic geometry alone. 
I think, though, that in the arithmetic setting, the use 
of schemes, and the concomitant extension of geomet- 
ric ideas to contexts that seem “nongeometric” at first 
glance, is particularly central. 

2 Geometry without Geometry 

Before we dive into the abstract theory of schemes, let 
us splash around a little longer among the polynomial 
equa tions of degree 2. Though it is not obvious from 
our discussion so far, the solution of Diophantine prob- 
lems is properly classified as part of geometry. Our goal 
here will be to explain why this is so. 

Suppose we consider the equation 

x 2 + y 2 = 1. (5) 

One can ask: which values of x, y e Q satisfy (5)? This 
problem has a flavor very different from that of the pre- 
vious section. There we looked at an equation with no 
rational solutions. We shall see in a moment that (5), 
by contrast, has infinitely many rational solutions. The 
solutions x = 0, y = 1 and x = §, y = -§ are rep- 
resentative examples. (The four solutions (±1,0) and 
(0, ±1) are the ones that would be said, in the usual 
mathematical parlance, to be “staring you in the face.”) 

Equation (5) is, of course, immediately recognizable 
as “the equation of a circle.” What, precisely, do we 
mean by that assertion? We mean that the set of pairs of 
real numbers (x,y) satisfying (5) forms a circle when 
plotted in the Cartesian plane. 

So geometry, as usually construed, makes its en- 
trance in the figure of the circle. Now suppose that we 
want to find more solutions to (5). One way to proceed 
is as follows. Let P be the point (1,0), and let L be a 
line through P of slope m. Then we have the following 
geometric faet. 

(G) The intersection of a line with a circle consists of 
either zero, one, or two points; the case of a single 
point occurs only when the line is tangent to the 
circle. 

From (G) we conclude that, unless L is the tangent line 
to the circle at P, there is exaetly one point other than 
P where the line intersects the circle. In order to find 
solutions ( x,y ) to (5), we must determine coordinates 
for this point. So suppose L is the line through (1,0) 
with slope m, which is to say it is the line L m whose 
equation is y = m(x - 1). Then in order to find the 


x-coordinates of the points of intersection between L m 
and the circle, we need to solve the simultaneous equa- 
tions y = m(x - 1) and x 2 + y 2 = 1; that is, we need 
to solve x 2 + m 2 (x - l) 2 = 1 or, equivalently, 

(1 + m 2 )x 2 - 2m 2 x + (m 2 - 1) = 0. (6) 

Of course, (6) has the solution x = 1. How many other 
solutions are there? The geometric argument above 
leads us to believe that there is at most one solution 
to (6). Alternatively, we can use the following algebraic 
faet, which is analogous 2 to the geometric faet (G). 

(A) The equation (1 + m 2 )x 2 - 2 m 2 x + ( m 2 - 1) = 0 
has either zero, one, or two solutions in x. 

Of course, the conclusion of statement (A) holds for 
any nontrivial quadratic equation in x, not just (6); it 
is a consequence of the factor theorem. 

In this case, it is not really necessary to appeal to any 
theorem; one can find by direct computation that the 
solutions of (6) are x = 1 and x = ( m 2 - 1 )/{m 2 + 
1). We conclude that the intersection between the unit 
circle and L m consists of (0, 1) and the point P m with 
coordinates 



Equation (7) establishes a correspondence m — P m , 
which associates with each slope m a solution P m to (5). 
What is more, since every point on the circle, other than 
(1,0) itself, is joined to (1, 0) by a unique line, we find 
that we have established a one-to-one correspondence 
between slopes m and solutions, other than (1,0), to 
equation (5). 

A very nice feature of this construction is that it 
allows us to construct solutions to (5) not only over 
R but over smaller helds, like Q: it is evident that, when 
m is rational, so are the coordinates of the solution 
yielded by (7). For example, taking m = 2 yields the 
solution ( | , - 1 ) . In faet, not only does (7) show us that 
(5) admits infinitely many solutions over Q, it also gives 
us an explicit way to parametrize the solutions in terms 
of a variable m. We leave it as an exercise to prove that 
the solutions of (5) over Q, apart from (1,0), are in one- 
to-one correspondence with rational values of m. Alas, 
rare is the Diophantine problem whose solutions can 
be parametrized in this way! Still, polynomial equations 
like (5) with solutions that can be parametrized by one 


2. Note that (A), unllke (G), contains no mention of tangency; that is 
because the notion of tangency is more subtle in the algebraic setting, 
as we will see in section 4 below. 
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or more variables play a special role in arithmetic geom- 
etry; they are called rational varieties and constitute by 
any measure the best-understood class of examples in 
the subject. 

I want to draw your attention to one essential fea- 
ture of this discussion. We relied on geometric intu- 
ition (e.g., our knowledge of facts like (G)) to give us 
ideas about how to construct solutions to (5). On the 
other hånd, now that we have erected an algebraic jus- 
tification for our construction, we can kick away our 
geometric intuition as needless scaffolding. It was a 
geometric faet about lines and circles that suggested 
to us that (6) should have only one solution other than 
x = 1. However, once one has had that thought, one can 
prøve that there is at most one such solution by means 
of the purely algebraic statement (A), which involves no 
geometry whatsoever. 

The faet that our argument can stand without any ref- 
erence to geometry means that it can be applied in sit- 
uations that might not, at first glance, seem geometric. 
For instance, suppose we wished to study solutions to 
(5) over the finite held F7. Now this solution set would 
not seem rightfully to be called “a circle” at all — it is 
just a finite set of points! Nonetheless, our geometri- 
cally inspired argument still works perfeetly. The pos- 
sible values of tn in f 7 are 0, 1, 2, 3, 4, 5, 6, and the 
corresponding solutions P m are (-1,0), (0,-1), (2,2), 
(5, 5), (5,2), (2, 5), (0, 1). These seven points, together 
with (1,0), form the whole solution set of (5) over F7. 

We have now started to reap the benefits of consid- 
ering a whole bundle of Diophantine problems at once; 
in order to find the solutions to (5) over F7, we used 
a method that was inspired by the problem of Ånd- 
ing solutions to (5) over R. Similarly, in general, meth- 
ods suggested by geometry can help us solve Diophan- 
tine problems. And these methods, once translated into 
purely algebraic form, still apply in situations that do 
not appear to be geometric. 

We must now open our minds to the possibility that 
the purely algebraic appearance of certain equations is 
deceptive. Perhaps there could be a sense of “geometry” 
that was general enough to include entities like the 
solution set of (5) over F7, and in which this particular 
example had every right to be called a “circle.” And why 
not? It has properties a circle has: most importantly for 
us, it has either zero, one, or two intersection points 
with any line. Of course, there are features of “circle- 
ness” which this set of points lacks: inAnitude, continu- 
ity, roundness, etc. But these latter qualities turn out to 
be inessential when we are doing arithmetic geometry. 


From our viewpoint the set of solutions of (5) over F7 
has every right to be called the unit circle. 

To sum up, you might think of the modern point of 
view as an upending of the traditional story of Carte- 
sian space. There, we have geometric objects (curves, 
lines, points, surfaces) and we ask questions such as, 
“What is the equation of this curve?” or “What are the 
coordinates of that point?” The underlying object is the 
geometric one, and the algebra is there to tell us about 
its properties. For us, the situation is exaetly reversed: 
the underlying object is the equation, and the various 
geometric properties of solution sets of the equation 
are merely tools that tell us about the equation's alge- 
braic properties. For an arithmetic geometer, “the unit 
circle” is the equation x 2 + y 2 = 1 . And the round thing 
on the page? That is just a picture of the solutions to 
the equation over R. It is a distinetion that makes a 
remarkable difference. 

3 From Varieties to Rings to Schemes 

In this section, we will attempt to give a clearer answer 
to the question, “What is a scheme?” Instead of trying to 
lay out a precise deffnition— which requires more alge- 
braic apparatus than would At comfortably here — we 
will approach the question by means of an analogy. 

3.1 Adjectives and Qualities 

So let us think about adjectives. Any adjective, such as 
“yellow” for instance, picks out a set of nouns to which 
the adjective applies. For each adjective A, we might 
call this set of nouns T(A). For instance, T( “yellow”) is 
an inAnite set that might look like {lemon, school bus, 
banana, sun, . . . }. 3 And anyone would agree that T (A) 
is an important thing to know about A. 

Now suppose that, moved by a desire for lexical par- 
simony, a theoretician among us suggested that adjec- 
tives could in faet be dispensed with entirely. If, instead 
of A, we spoke only of T(A), we could get by with a 
grammatical theory involving only nouns. 

Is this a good idea? Well, there are certainly some 
obvious ways that things could go wrong. For instance, 
what if lots of different adjectives were sent to the same 
set of nouns? Then our new viewpoint would be less 
precise than the old one. But it certainly seems that if 
two adjectives apply to exaetly the same set of nouns, 


3. Of course, In real Ilfe, there are nouns whose relationship with 
“yellow” is not so clear-cut, but since our goal Is to make this look like 
mathematics, let us pretend that every object in the world is either 
definitively yellow or deftnitively not yellow. 
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then it is fair to say that the adjectives are the same, or 
at least synonymous. 

What about relationships between adjectives? For 
instance, we can ask of two adjectives whether one 
is stronger than another, in the way that “gigantic” 
is stronger than “large.” Is this relationship between 
adjectives still visible on the level of sets of nouns? The 
answer is yes: it seems fair to say that A is “stronger 
than” B precisely when r ( A) is a subset ofT(B). In other 
words, what it means to say that “gigantic” is stronger 
than “large” is that all gigantic things are large, though 
some large things may not be gigantic. 

So far, so good. We have paid a price in technical diffi- 
culty: it is much more cumbersome to speak of infinite 
sets of nouns than it was to use simple, familiar adjec- 
tives. But we have gained something, too: the oppor- 
tunity for gener alization. Our theoretician — whom we 
may now call a “set-theoretic grammarian”— observes 
that there is, perhaps, nothing special about the sets 
of nouns that happen to be of the form r (A) for some 
already known adjective A. Why not take a conceptual 
leap and redefine the word “adjective” to mean “a set of 
nouns”? To avoid confusion with the usual meaning of 
“adjective,” the theoretician might even use a new term, 
like “quality,” to refer to his new objects of study. 

Now we have a whole new world of qualities to play 
with. For example, there is a quality {“school bus”, 
“sun”} which is stronger than “yellow,” and a quality 
{“sun”} (not the same thing as the noun “sun”!) which is 
stronger than the qualities “yellow,” “gigantic,” “large,” 
and {“school bus”, “sun”}. 

I may not have convinced you that, on balance, this 
reconception of the notion of “adjective” is a good 
idea. In faet, it probably is not, which is why set-theo- 
retic grammar is not a going concern. The correspond- 
ing story in algebraic geometry, however, is quite a 
different matter. 

3.2 Coordinate Rings 

A warning: the next couple of sections will be difficult 
going for those not familiar with rings and ideals— such 
readers can either skip to section 4, or try to follow the 
discussion after reading rings, ideals, and modules 
[III.83] (see also algebraic numbers [IV.l]). 

Let us recall that a complex affine variety (from now 
on, just “variety”) is the set of solutions over € to some 
finite set of polynomial equations. For instance, one 
variety V we could define is the set of points (x,y) 


in C 2 satisfying our favorite equation 

x 2 + y 2 = 1. (8) 

Then V is what we called in the previous section “the 
unit circle,” though in faet the shape of the set of 
complex solutions of (8) is a sphere with two points 
removed. (This is not supposed to be obvious.) It is a 
question of general interest, given some variety X, to 
understand the ring of polynomial funetions that take 
points on X to complex numbers. This ring is called the 
coordinate ring of X, and is denoted T ( X ). 

Certainly, given any polynomial in x and y, we can 
regard it as a funetion defined on our particular vari- 
ety V. So is the coordinate ring of V just the polyno- 
mial ring C [x,y]? Not quite. Consider, for instance, the 
funetion / = 2x 2 + 2 y 2 + 5. If we evaluate this funetion 
at various points on V, 

/( 0,1) = 7, /(1,0) = 7, 

/( 1/72,1/72) = 7, /(i, V2) ■- 7 

we notice that / keeps taking the same value; indeed, 
since x 2 + y 2 = 1 for all ( x,y ) g V, we see that / = 
2(x 2 + y 2 ) + 5 takes the value 7 at every point on V. 
So 2x 2 + ly 2 + 5 and 7 are just different names for the 
same funetion on V. 

So T (V) is smaller than Cl*, y ]; it is the ring obtained 
from C \x,y\ by declaring two polynomials / and g 
to be the same funetion whenever they take the same 
value at every point of V. (More formally, we are defin- 
ing an equivalence relation [1.2 §2.3] on the set of 
complex polynomials in two variables.) It turns out that 
/ and g have this property precisely when their differ- 
ence is a multiple of x 2 + y 2 - 1. Thus, the ring of poly- 
nomial funetions on V is the quotient of C [x,y] by the 
ideal generated by x 2 + y 2 - 1. This ring is denoted by 
C[x,y]/(x 2 +y 2 -l). 

We have shown how to attach a ring of funetions to 
any variety. It is not hard to show that, if X and Y are 
two varieties, and if their coordinate rings T(X) and 
r(Y) are isomorphic [1.3 §4.1], then X and Y are in 
a sense the “same” variety. It is a short step from this 
observation to the idea of abandoning the study of vari- 
eties entirely in favor of the study of rings. Of course, 
we are here in the position of the set-theoretic gram- 
marian in the parable above, with “variety” playing the 
part of “adjective” and “coordinate ring” the part of “set 
of nouns.” 

Happily, we can recover the geometric properties of 
a variety from the algebraic properties of its coordinate 
ring; if this were not the case, the coordinate ring would 
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not be such a useful object! The relationship between 
geometry and algebra is a long story— and much of it 
belongs to algebraic geometry in general, not arithmetic 
geometry in particular — but to give the flavor, let us 
discuss some examples. 

A straightforward geometric property of a variety is 
irreducibility. We say a variety X is reducible if X can 
be expressed as the union of two varieties X\ and X2, 
neither of which is the whole of X. For example, the 
variety 

x 2 = ;y 2 (9) 

in C 2 is the union of the lines x = y and x = -y. A vari- 
ety is called irreducible if it is not reducible. All varieties 
are thus built up from irreducible varieties: the relation- 
ship between irreducible varieties and general varieties 
is rather like the relationship between prime numbers 
and general positive integers. 

Moving from geometry to algebra, we recall that a 
ring R is called an integral domain if, whenever /, g 
are nonzero elements of R, their product fg is also 
nonzero; the ring C [x,y] is a good example. 

Faet. A variety X is irreducible if and only if F (X) is an 
integral domain. 

Experts will note that we are glossing over issues of 
“reducedness” here. 

We will not prove this faet, but the following exam- 
ple is illustrative: consider the two funetions f = x-y 
and g = x + y on the variety X defined by (9). Nei- 
ther of these funetions is the zero funetion; note, for 
instance, that /(l, -1) is nonzero, as is g(l, 1). Their 
product, however, is x 2 - y 2 , which is equal to zero 
on X\ so r(X) is not an integral domain. Notice that 
the funetions / and g that we chose are closely related 
to the decomposition of X as the union of two smaller 
varieties. 

Another crucial geometric notion is that of funetions 
from one variety to another. (It is common practice to 
call such funetions “maps” or “morphisms”; we will use 
the three words interchangeably.) For instance, sup- 
pose that W is the variety in C 3 determined by the 
equation xyz = 1. Then the map F : C 3 — C 2 defined 
by 

F(x,y,z) = Q(x + yz),^(x-yz)'j 

maps points of W to points of V. 

It turns out that knowing the coordinate rings of 
varieties makes it very easy to see the maps between 
the varieties. We merely observe that if G : Vi — V2 
is a map between varieties Vi and V2, and if / is a 


polynomial funetion on V2, then we have a polynomial 
funetion on Vi that sends every point v to f(G(v)). 
This funetion on Vi is denoted by G*(f). For exam- 
ple, if / is the funetion x + y on V, and F is the map 
above, F*(/) = \(x + yz) + (l/2i)(x - yz). It is easy 
to check that G* is a C-algebra homomorphism (that is, 
a homomorphism of rings that sends each element of 
C to itself) from T ( V2 ) to T( Vi ) . What is more, one has 
the following theorem. 

Faet. For any pair of varieties V, W, the correspon- 
dence sending G to G* is a bijection between the poly- 
nomial funetions sending W to V and the C-algebra 
homomorphisms from T(V) to T (W). 

You would not be far off in thinking of the statement 
“there is an injective map from V to W” as analogous 
to “quality A is stronger than quality B." 

The move to transform geometry into algebra is 
not something one undertakes out of sheer love of 
abstraction, or hatred of geometry. Instead, it is part 
of the universal mathematical instinet to unify seem- 
ingly disparate theories. I cannot put it any better 
than Dieudonné (1985) does in his History of Algebraic 
Geometry: 

. . . from [the 1882 memoirs of] Kronecker and Dede- 
kind-Weber dates the awareness of the profound 
analogies between algebraic geometry and the theory 
of algebraic numbers, which originated at the same 
time. Moreover, this conception of algebraic geometry 
is the most simple and most clear for us, trained as 
we are in the wielding of “abstract” algebraic notions: 
rings, ideals, modules, etc. But it is precisely this 
“abstract” character that repulsed most contempo- 
raries, disconcerted as they were by not being able 
to recover the corresponding geometric notions eas- 
ily. Thus the influence of the algebraic school remained 
very weak up until 1920 It certainly seems that Kro- 

necker was the first to dream of one vast algebraico- 
geometric construction comprising these two theories 
at once; this dream has begun to be reahzed only 
recently, in our era, with the theory of schemes. 

Let us therefore move on to schemes. 

3.3 Schemes 

We have seen that each variety X gives rise to a ring 
r (X), and furthermore that the algebraic study of these 
rings can stand in for the geometric study of varieties. 
But just as not every set of nouns corresponds to an 
adjective, not every ring arises as the coordinate ring 
of a variety. For example, the ring Z of integers is not 
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the coordinate ring of a variety, as we can see by the 
following argument: for every complex number a and 
every variety V, the constant function a is a function on 
V, and therefore € c T iV) for every variety V. Since Z 
does not contain € as a subring, it is not the coordinate 
ring of any variety. 

Now we are ready to imitate the set-theoretic gram- 
marian’s coup de gråce. We know that some, but not all, 
rings arise from geometric objects (varieties); and we 
know that the geometry of these varieties is described 
by algebraic properties of these special rings. Why not, 
then, just consider every ring R to be a “geometric 
object” whose geometry is determined by algebraic 
properties of R7 The grammarian needed to invent a 
new word, “quality,” to describe his generalized adjec- 
tives; we are in the same position with our rings-that- 
are-not-coordinate-rings; we will call them schemes. 

So, after all this work, the definition of scheme is 
rather prosaic— schemes are rings! (In faet, we are hid- 
ing some technicalities; it is correct to say that affine 
schemes are rings. Restricting our attention to affine 
schemes will not interfere with the phenomena that 
we are aiming to explain.) More interesting is to ask 
how we can carry out the task whose difficulty “dis- 
concerted” the early algebraic geometers— how can we 
identify “geometric” features of arbitrary rings? 

For instance, if R is supposed to be an arbitrary geo- 
metric object, it ought to have “points.” But what are 
the “points” of a ring? Clearly we cannot mean by this 
the elements of the ring; for in the case R = r(X), the 
elements of R are funetions on X, not points on X. What 
we need, given a point p on X, is some entity attached 
to the ring R that corresponds to p. 

The key observation is that we can think of p as a 
map from T (X) to €: given a function / from f (X) 
we map it to the complex number fip). This map is a 
homomorphism, called the evaluation homomorphism 
at p. Since points on X give us homomorphisms on 
r(X), a natural way to define the word “point” for the 
ring R = r(X), without using geometry, is to say that 
a “point” is a homomorphism from R to C. It turns out 
that the kernel of such a homomorphism is a prime 
ideal. Moreover, with the exception of the zero ideal, 
every prime ideal of R arises from a point p of X. So a 
very concise way to describe the points of X might be 
to say that they are the nonzero prime ideals of R. 

The definition we have arrived at makes sense for 
all rings R, and not just those of the form R = f(X). 
So we might define the “points” of a ring R to be its 
prime ideals. (Considering all prime ideals, rather than 


only the nonzero ones, turns out to be a wiser technical 
choice.) The set of prime ideals of R is given the name 
Spec R, and it is Spec R that we call the scheme associ- 
ated with R. (More precisely, Spec R is defined to be a 
“locally ringed topological space” whose points are the 
prime ideals of R, but we will not need the full power 
of this definition for our discussion here.) 

We are now in a position to elucidate our claim, 
made in the first section, that a scheme incorporates 
into one package Diophantine problems over many dif- 
ferent rings. Suppose, for instance, that R is the ring 
z[x, y\/ix 2 +y 2 - 1). We are going to catalog the homo- 
morphisms f : R — Z. To specify /, I merely have to 
tell you the values of fix) and fiy) in z. But I cannot 
choose these values arbitrarily: since x 2 + y 2 - 1 = 0 
in R, it must be the case that 

fix) 2 + fiy) 2 - 1 = 0 

in z. In other words, the pair (fix), fiy) ) constitutesa 
solution over Z to the Diophantine equation x 2 + y 2 = 
1. What is more, the same argument shows that, for any 
ring S, a homomorphism / : R — ■ 5 yields a solution 
over 5 to x 2 + y 2 = 1, and vice versa. In summary, 

for each S, there is a one-to-one correspondence be- 

tween the set of ring homomorphisms from R to S, 

and solutions over S to x 2 + y 2 = 1. 

This behavior is what we have in mind when we say that 
the ring R “packages” information about Diophantine 
equations over different rings. 

It turns out, just as one might hope, that every inter- 
esting geometric property of varieties canbe computed 
by means of the coordinate ring, which means it can be 
defined, not only for varieties, but for general schemes. 
We have already seen, for instance, that a variety X is 
irreducible if and only if T (X) is an integral domain. 
Thus, we say in general that a scheme Spec R is irre- 
ducible if and only if R is an integral domain (or, more 
precisely, if the quotient of R by its nilradical is an inte- 
gral domain). One can speak of the connectedness of a 
scheme, its dimension, whether it is smooth, and so 
forth. All these geometric properties turn out, like irre- 
ducibility, to have purely algebraic descriptions. In faet, 
to the arithmetic geometer’s way of thinking, all these 
are, at bottom, algebraic properties. 

3.4 Example: Spec Z, the Number Line 

The first ring we encounter in our mathematical 
education— and the ring that is the ultimate subject of 
number theory— is Z, the ring of integers. How does it 
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fit into our picture? The scheme Spec Z has as its points 
the set of prime ideals of Z, which come in two flavors: 
there are the principal ideals (p), with p a prime num- 
ber; and there is the zero ideal. (The faet that these are 
the only prime ideals of Z is not a triviality; it can be 
derived from the euclidean algorithm [III.22].) 

We are supposed to think of Z as the ring of “func- 
tions” on Spec Z. How can an integer be a funetion? Well, 
I merely need to tell you how to evaluate an integer n at 
a point of Spec Z. If the point is a nonzero prime ideal 
(p), then the evaluation homomorphism at (p) is pre- 
cisely the homomorphism whose kernel is (p); so the 
value of n at (p) is just the reduction of n modulo p. 
At the point (0), the evaluation homomorphism is the 
identity map Z — Z; so the value of n at (0) is just n. 

4 How Many Points Does a Circle Have? 

We now return to the method of section 2, paying 
particular attention to the case where the equation 
x 2 + y 2 = 1 is considered over a finite held F p . 

Let us write V for the scheme of solutions of x 2 + 
y 2 = 1. For any ring R, we will denote by V(R) the set 
of solutions of x 2 + y 2 = 1. 

If R is a finite held F p , the set V(F P ) is a subset of F 2 . 
In particular, it is a finite set. So it is natural to wonder 
how large this set is: in other words, how many points 
does a circle have? 

In section 2, guided by om geometric intuition, we 
observed that, for every m e Q, the point 
( m 2 - 1 -2 m \ 

m \ + i ) 

lies on V. 

The algebraic computation showing that P m satisfies 
the equation x 2 + y 2 = 1 is no different over a finite 
held. So we might be inclined to think that V(F P ) con- 
sists of p + 1 points: namely, the points P m for each 
m g F p , together with (1,0). 

But this is not right: for instance, when p = 5 it is 
easy to check that the fom points (0, 1), (0, -1), (1,0), 
(-1,0) make up all of V(Fs). Computing P m for vari- 
ous m, we quickly discover the problem; when m is 2 
or 3, the formula for P m does not make sense, because 
the denominator m 2 + 1 is zero! This is a wrinkle we 
did not see over Q, where m 2 + 1 was always positive. 

What is the geometric story here? Consider the inter- 
section of the line L2, that is, the line y = 2(x - 1), 
with V. If (x,y) belongs to this intersection, then we 


have 

x 2 + (2(x - l)) 2 = 1, 

5x 2 - 8x + 3 = 0. 

Since 5 = 0 and 8 = 3 in F5, the above equation can 
be written as 3 - 3% = 0; in other words, x = 1, which 
in tmn implies that y = 0. In other words, the line L2 
intersects the circle V at only one point! 

We are left with two possibilities, both distmbing to 
our geometric intuition. We might declare that L2 is tan- 
gent to V ; but this means that V would have multiple 
tangents at (1,0), since the vertical line x = 1 should 
surely still be considered a tangent. The alternative is 
to declare that L2 is not tangent to V; but then we 
are in the equally unsavory situation of having a line 
which, while not tangent to the circle V, intersects it 
at only one point. You are now beginning to see why I 
did not include an algebraic definition of “tangent” in 
statement (A) above! 

This quandary illustrates the nature of arithmetic 
geometry nicely. When we move into novel contexts, 
like geometry over F p , some features stay fixed (such 
as “a line intersects a circle in at most two points”), 
while others have to be discarded (such as “there exists 
exaetly one line, which we may call the tangent line to 
the circle at (1, 0), that intersects the circle at (1, 0) and 
no other point” 4 ). 

Notwithstanding these subtleties, we are now ready 
to compute the number of points in V(F P ). First of 
all, when p = 2 one can check direetly that (0,1) 
and (1,0) are the only two points in ViF-z). (Another 
cornmon refrain in arithmetic geometry is that helds 
of characteristic 2 often impose technical annoyances, 
and are hest dealt with separately.) Having treated this 
case, we assume for the rest of this section that p 
is odd. It follows from basic number theory that the 
equation m 2 + 1 = 0 has a solution in F p if and only 
if p = 1 (mod 4), in which case there are exaetly two 
such m. So, if p = 3 (mod 4), then every line L m inter- 
sects the circle at a point other than (1,0), and we have 
p + 1 points in all. If p = 1 (mod 4), there are two 
choices of m for which L m intersects V only at (1,0); 
eliminating these two choices of m yields a total of 
p - 1 points in V(F P ). 

We conclude that |V(F P )| is equal to 2 when p = 2, 
to p - 1 when p = 1 (mod 4), and to p + 1 when p = 1 
(mod 4). The interested reader will find the following 


4. In this case, the right attitude to adopt is that L 2 is not tangent to 
V, but that there are certain nontangent lines that intersect the circle 
at a single point. 
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exercises useful: how many solutions are there to x 2 + 
3 y 2 = 1 over F p ? What about x 2 + y 2 = 0? 

More generally, let X be the scheme of solutions of 
any system of equations 

Fl(Xl,...,X„) =0, F 2 (Xu...,X n ) = 0, ..., (10) 

where the F t are polynomials with integral coeffi- 
cients. Then one can associate with F a list of integers 
N 2 (X),N 3 (X),N 5 (X),..., where N P (X) is the number 
of solutions to (10) with xi,...,x n e ¥ p . This list of 
integers turns out to contain a surprising amount of 
geometric information about the scheme X\ even for 
the simplest schemes, the analysis of these lists is a 
deep problem of intense current interest, as we will see 
in the next section. 

5 Some Problems in Classical and 
Contemporary Arithmetic Geometry 

In this section I will try to give an impression of a few of 
arithmetic geometry’s great successes, and to gesture 
at some problems of current interest for researchers in 
the area. 

A word of warning is in order. In what follows, I will 
be trying to give brief and nontechnical descriptions 
of some mathematics of extreme depth and complex- 
ity. Consequently, I will feel very free to oversimplify. 
I will try to avoid making assertions that are actually 
false, but I will often use definitions (like that of the 
L-function attached to an elliptic curve) that do not 
exactly agree with those in the literature. 

5.1 From Fermat to Birch-Swinnerton-Dyer 

The world is not lacking in expositions of the proof of 
fermat’s last theorem [V.12] and I will not attempt 
to give another one here, although it is without ques- 
tion the most notable contemporary achievement in 
arithmetic geometry. (Here I am using the mathemati- 
cian’s sense of “contemporary,” which, as the old joke 
goes, means “theorems proved since I entered graduate 
school.” The shorthand for “theorems proved before I 
entered graduate school” is “classical.”) I will content 
myself with making some comments about the struc- 
ture of the proof, emphasizing connections with the 
parts of arithmetic geometry we have discussed above. 

Fermat’s last theorem (rightly called “Fermat’s con- 
jecture,” since it is almost impossible to imagine that 
fermat [VI. 12] proved it) asserts that the equation 


where ft is an odd prime, has no solutions in positive 
integers A, B, C. 

The proof uses the crucial idea, introduced indepen- 
dently by Frey and Hellegouarch, of associating with 
any solution ( A,B,C ) of (11) a certain variety Xa ,b, 
namely the curve described by the equation 
y 2 =x(x-A*)(x + B e ). 

What can we say about N p ( Xa,b )? We begin with a sim- 
ple heuristic. There are p choices for x in F p . For each 
choice of x, there are either zero, one, or two choices 
for y, depending on whether x(x - A ( ')(x + B^) is 
a quadratic nonresidue, zero, or a quadratic residue 
in F p . Since there are equally many quadratic residues 
and nonresidues in ¥ p , we might guess that those two 
cases arise equally often. If so, there would on average 
be one choice of y for each of the p choices of x, which 
inclines us to make the estimate N p (Xa,b) ~ P- Define 
a p to be the error in this estimate: a p = p - N p (Xa,b)- 
It is worth remembering that when X was the scheme 
attached to x 2 + y 2 = 1, the behavior of p - N P (X) 
was very regular; in particular, this quantity took the 
value 1 at primes congruent to 1 mod 4 and -1 at 
primes congruent to 3 mod 4. (We note, in particular, 
that the heuristic estimate N p ( X ) ~ p is quite good in 
this case.) Might one hope that a p displays the same 
kind of regularity? 

In faet, the behavior of the a p is very irregular, as a 
famous theorem of Mazur shows; not only do the a p 
fail to vary periodically, even their reductions modulo 
various primes are irregular! 

Faet (Mazur). Suppose that f is a prime greater than 3, 
and let b be a positive integer. It is not the case that 
a p takes the same value (modf) for all primes p 
congruent to 1 (mod b). 5 

On the other hånd— if I may compress a 200-page 
paper into a slogan— Wiles proved that, when A, B, C 
is a solution to (11), the reductions mod £ of the a p 
necessarily behaved periodically, contradicting Mazur’s 
theorem when f > 3. The case f = 3 is an old theorem 
of euler [VI. 19], This completes the proof of Fermat’s 
conjecture, and, I hope, bolsters our assertion that the 
careful study of the values N p (X) is an interesting way 
to study a variety X\ 


5. The theorem proved by Mazur is stated by him in a very different 
and mueh more general way: he proves that certain modular curves 
do not possess any rational points. This implies that a version of the 
faet above is true, not only for Xa,b, but for any equation of the form 
y 2 = f(x ), where / is a cubic polynomial without repeated roots. We 
will leave it to the other able treatments of Fermat to develop that 


A e + B e = C e , 


( 11 ) 
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But the story does not end with Fermat. In general, 
if f{x) is a cubic polynomial with coefficients in z and 
no repeated roots, the curve E defined by the equation 
y 2 = f(x) (12) 

is called an elliptic curve [111.21] (note well that an 
elliptic curve is not an ellipse). The study of rational 
points on elliptic curves (that is, pairs of rational num- 
bers satisfying (12)) has been occupying arithmetic 
geometers since before our subject existed as such; 
a decent treatment of the story would fill a book, as 
indeed it does fill the book of Silverman and Tate 
(1992). We can define a p (E) to be p - N P (E) as above. 
First of all, if our heuristic N p (E) ~ p is a good esti- 
mate, we might expect that a p (E) is small compared 
with p; and, in faet, a theorem of Hasse from the 1930s 
shows that a p (E) ^ 2,Jp for all but finitely many p. 

It turns out that some elliptic curves have infinitely 
many rational points, and some only finitely many. One 
might expect that an elliptic curve with many points 
over Q would tend to have more points over finite helds 
as well, since the coordinates of a rational point can be 
reduced mod p to yield a point over the finite held F p . 
Conversely, one might imagine that, by knowing the list 
of numbers a p , one could draw conclusions about the 
points of E over Q. 

In order to draw such conclusions, one needs a nice 
way to package the information of the inhnite list of 
integers a p . Such a package is given by the L-function 
[III.49] of the elliptic curve, dehned to be the following 
funetion of a variable 5: 

L(E,s)= Y\' p ^-a P p- s + P 1 - 25 )- 1 - (13) 

The notation fT means that this product is evaluated 
over all primes apart from a finite set, which is easy 
to determine from the polynomial /. (As is often the 
case, we are oversimplifying; what I have written here 
differs in some irrelevant-to-us respects from what is 
usually called L(E, s) in the literature.) It is not hard to 
check that (13) is a convergent product when 5 is a real 
number greater than |. Not mueh deeper is the faet 
that the right-hand side of (13) is well-defined when s 
is a complex number whose real part exceeds § . What 
is mueh deeper— following from the theorem of Wiles, 
together with later theorems of Breuil, Conrad, Dia- 
mond, and Taylor— is that we can extend L(E,s ) to 
a holomorphic function [1.3 §5.6] defined for every 
complex number s. 

A heuristic argument might suggest the following 
relationship between the values of N P (E) and the 


value of L(E, 1). If the a p are typically negative (corre- 
sponding to the N p (E) typically being greater than p) 
the terms in the infinite product tend to be smaller 
than 1; when the a p are positive, the terms in the 
product tend to be larger than 1. In particular, one 
might expect the value of L(E, 1) to be doser to 0 
when E has many rational points. Of course, this 
heuristic should be taken with a healthy pinch of salt, 
given that L(E, 1) is not in faet defined by the infi- 
nite product on the right-hand side of (13)! Nonethe- 
leSS, THE BIRCH-SWINNERTON-DYER CONJECTURE [V.4], 
which makes precise the heuristic prediction above, 
is widely believed, and supported by many partial 
results and numerical experiments. We do not have the 
space here to State the conjecture in full generality. 
However, the following conjecture would follow from 
Birch-Swinnerton-Dyer. 

Conjecture. The elliptic curve E has infinitely many 
points over Q if and only if L(E, 1) = 0. 

Kolyvagin proved one direction of this conjecture 
in 1988: that E has finitely many rational points if 
L(E, 1) * 0. (To be precise, he proved a theorem that 
yields the assertion here once combined with the later 
theorems of Wiles and others.) It follows from a the- 
orem of Gross and Zagier that E has infinitely many 
rational points if L(E,s) has a simple zero at s = 1. That 
more or less sums up our present knowledge about the 
relationship between L-functions and rational points 
on elliptic curves. This lack of knowledge has not, how- 
ever, prevented us from constructing a complex of ever 
more rarefied conjectures in the same vein, of which 
the Birch-Swinnerton-Dyer conjecture is only a tiny and 
relatively down-to-earth siiver. 

Before we leave the subject of counting points be- 
hind, we will pause and point out one more beauti- 
ful result: the theorem of andréweil [Vf;93j bound- 
ing the number of points on a curve over a finite held. 
(Because we have not introduced projective geometry, 
we will satisfy ourselves with a somewhat less beauti- 
ful formulation than the usual one.) Let E(x,y) be an 
irreducible polynomial in two variables, and let X be 
the scheme of solutions of F(x,y) = 0. Then the com- 
plex points of X define a certain subset of C 2 , which we 
call an algebraic curve. Since X is obtained by impos- 
ing one polynomial condition on the points of C 2 , we 
expect that X has complex dimension 1, which is to say 
it has real dimension 2. Topologically speaking, X (C) is, 
therefore, a surface. It turns out that, for almost all 
choices of F, the surface A(C) will have the topology 
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of a “g-holed doughnut” with d points removed, for 
some nonnegative integers g and d. In this case we say 
that X is a curve of genus g. 

In section 2 we saw that the behavior of schemes over 
finite helds seemed to “remember” facts arising from 
our geometric intuition over R and C: our example there 
was the faet that circles and lines intersect in at most 
two points. 

The theorem of Weil reveals a similar, though mueh 
deeper, phenomenon. 

Faet. Suppose the scheme X of solutions of F(x,y) 
is a curve of genus g. Then, for all but Hnitely many 
primes p, the number of points of X over F p is at most 
p + 1 + 2 gfp and at least p + 1 - 2 gfp - d. 

Weil’s theorem illustrates the startlingly close bonds 
between geometry and arithmetic. The more compli- 
cated the topology of X(C), the further the number of 
Fp-points can vary from the “expected” answer of p. 
What is more, it turns out that knowing the size of 
the set X(¥ q ) for every finite held ¥ q allows us to 
determine the genus of X. In other words, the finite 
sets of points X (Fq ) somehow “remember” the topol- 
ogy of the space of complex points X(C)\ In modern 
language, we say that there is a theory applying to gen- 
eral schemes, called étale cohomology, which mimics 
the theory of cohomology applying to the topology of 
varieties over C. 

Let us return for a moment to our favorite curve, by 
taking the polynomial F(x,y) = x 2 + y 2 - 1. In this 
case, it turns out that XiC) has g = 0 and d = 2: 
our previous result that X(¥ p ) contains either p + 1 or 
p - 1 points therefore conforms exaetly with the Weil 
bounds. We also remark that elliptic curves always have 
genus 1; so the theorem of Hasse alluded to above is a 
special case of Weil’s theorem as weh. 

Recall from section 2 that the solutions to x 2 + 
y 2 = 1, over R, over Q, or over various finite helds, 
could be parametrized by the variable m. It was this 
parametrization that enabled us to determine a sim- 
ple formula for the size of X(¥ p ) in this case. We 
remarked earlier that most schemes could not be so 
parametrized; now we can make that statement a bit 
more precise, at least for algebraic curves. 

Faet. If X is a genus-0 curve, then the points of X can 
be parametrized by a single variable. 

The converse of this faet is more or less true as well 
(though stating it properly requires us to say more than 
we can here about “singular curves”). In other words, a 


thoroughly algebraic question— whether the solutions 
of a Diophantine equation can be parametrized — is 
hereby given a geometric answer. 

5.2 Rational Points on Curves 

As we said above, some elliptic curves (which are curves 
of genus 1) have hnitely many rational points, and 
others have inhnitely many. What is the situation for 
algebraic curves of other havors? 

We have already encountered a curve of genus 0 with 
inhnitely many points: namely, the curve x 2 + y 2 = 1. 
On the other hånd, the curve x 2 + y 2 = 7 also has 
genus 0, and a simple modiheation of the argument of 
the hrst section shows that this curve has no rational 
points. It turns out these are the only two possibilities. 

Faet. If X is a curve of genus 0, then X (Q) is either 
empty or inhnite. 

Genus- 1 curves are known to fail into a similar 
dichotomy, thanks to the theorem of Mazur we alluded 
to earlier. 

Faet. If X is a genus-1 curve, then either X has at most 
sixteen rational points or it has inhnitely many rational 
points. 

What about curves of higher genus? In the early 
1920s, Mordell made the following conjecture. 

Conjecture. If X is a curve of genus greater than 2, 
then X has hnitely many rational points. 

This conjecture was proved by Faltings in 1983; 
in faet, he proved a more general theorem of which 
this conjecture is a special case. It is worth remark- 
ing that the work of Faltings involves a great deal of 
importation of geometric intuition to the study of the 
scheme Spec z. 

When you prove that a set is finite, it is natural to 
wonder whether you can bound its size. For example, if 
f(x) is a degree 6 polynomial with no repeated roots, 
the curve y 2 = f{x) turns out to have genus 2; so by 
Faltings’s theorem there are only Hnitely many pairs of 
rational numbers ( x,y ) satisfying y 2 = fix). 

Question. Is there a constant B such that, for all 
degree 6 polynomials with coefhcients in <Q> and no 
repeated roots, the equation y 2 = fix) has at most 
B solutions? 
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wiiiget This question remains open, and I do not think there 
is a strong consensus about whether the answer will be 
yes or no. The current world record is held by the curve 
y 2 = 378 371081x 2 (% 2 - 9) 2 - 229833600(x 2 - l) 2 , 


expiain. constructed by Keller and Kulesz, which has 588 

rational points. 

Interest in the above question comes from its rela- 
tion to a conjecture of Lang, which involves points 
on higher-dimensional varieties. Caporaso, Harris, and 
Mazur showed that Lang’s conjecture implies a posi- 
tive answer to the question above. This suggests a nat- 
ural attack on the conjecture: if one can find a way to 
construct an infinite sequence of degree 6 polynomials 
fix) so that the equations y = fix) have ever more 
numerous rational solutions, then one has a disproof 
of Lang’s conjecture! No one has yet been successful 
at this task. If one could prøve that the answer to the 
question above was affirmative, it would probably bol- 
ster our faith in the correctness of Lang’s conjecture, 
though of course it would bring us no nearer to turning 
the conjecture into a theorem. 

In this article we have seen only a glimpse of the 
modern theory of arithmetic geometry, and perhaps I 
have overemphasized mathematicians’ successes at the 
expense of the much larger territory of questions, like 
Lang’s conjecture above, about which we remain wholly 
ignorant. At this stage in the history of mathematics, 
we can confidently say that the schemes attached to 
Diophantine problems have geometry. What remains 
is to say as much as we can about what this geom- 
etry is like, and in this respect, despite the progress 
described here, our understanding is still quite unsat- 
isfactory when compared with our knowledge of more 
classical geometric situations. 
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IV.6 Algebraic Topology 

Burt Totaro 


Introduction 

Topology is concerned with the properties of a geomet- 
ric shape that are unchanged when we continuously 
deform it. In more technical terms, topology tries to 


classify topological spaces [III.92], where two spaces 
are considered the same if they are homeomorphic. 
Algebraic topology assigns numbers to a topological 
space, which canbe thought of as the “number of holes” 
in that space. These holes can be used to show that 
two spaces are not homeomorphic: if they have differ- 
ent numbers of holes of some kind, then one cannot 
be a continuous deformation of the other. In the happi- 
est cases, we can hope to show the converse statement: 
that two spaces with the same number of holes (in some 
precise sense) are homeomorphic. 

Topology is a relatively new branch of mathematics, 
with its origins in the nineteenth century. Before that, 
mathematics usually sought to solve problems exactly: 
to solve an equation, to find the path of a falling body, 
to compute the probability that a game of dice will 
lead to bankruptcy. As the complexity of mathemati- 
cal problems grew, it became clear that most problems 
would never be solved by an exact formula: a classic 
example is the problem, known as the three-body 
problem [V.36], of computing the future movements 
of Earth, the Sun, and the Moon under the influence of 
gravity. Topology allows the possibility of making qual- 
itative predictions when quantitative ones are impossi- 
ble. For example, a simple topological faet is that a trip 
from New York to Montevideo must cross the equator 
at some point, although we cannot say exactly where. 

1 Connectedness and Intersection Numbers 

Perhaps the simplest topological property is one called 
connectedness. This can be defined in various ways, as 
we shall see in a moment, but once we have a notion of 
what it means for a space to be connected we can then 
divide a topological space up into connected pieces, 
called components. The number of these pieces is a sim- 
ple but useful invariant [1.4 §2.2]: if two spaces have 
different numbers of connected components, then they 
are not homeomorphic. 

For nice topological spaces, the different definitions 
of connectedness are equivalent. However, they can be 
generalized to give ways of measuring the number of 
holes in a space; these generalizations are interestingly 
different and all of them are important. 

The first interpretation of connectedness uses the 
notion of a path, which is defined to be a continuous 
mapping / from the unit interval [0, 1 ] to a given space 
X. (We think of / as a path from /(O) to /(l).) Let us 
declare two points of X to be equivalent if there is a 
path from one to the other. The set of equivalence 
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classes [1.2 §2.3] is called the set of path components 
of X and is written ttq(X). This is a very natural way of 
defining the “number of connected pieces” into which 
X breaks up. One can generalize this notion by con- 
sidering mappings into X from other standard spaces 
such as spheres: this leads to the notion of homotopy 
groups, which will be the topic of section 2. 

A different way of thinking about connectedness is 
based on functions from X to the real line rather than 
functions from a line segment into X. Let us assume 
that we are in a situation where it makes sense to dif- 
ferentiate functions on X. For example, X could he an 
open subset of some Euclidean space, or more gener- 
ally a smooth manifold [1.3 §6.9]. Consider all the real- 
valued functions on X whose derivative is everywhere 
equal to zero: these functions form a real vector space 
[1.3 §2.3], which we call H°(X, R) (the “zeroth cohom- 
ology group of X with real coefficients”). Calculus tells 
us that if a function defined on an interval has deriva- 
tive zero, then it must be constant, but that is not true 
when the domain has several connected pieces: all we 
can say then is that the function is constant on each 
connected piece of X. The number of degrees of free- 
dom of such a function is therefore equal to the num- 
ber of connected pieces, so the dimension of the vector 
space H° (X, R) is another way to describe the number 
of connected components of X. This is the simplest 
example of a cohomology group. Cohomology will be 
discussed in section 4. 

We can use the idea of connectedness to prove a seri- 
ous theorem of algebra: every real polynomial of odd 
degree has a real root. For example, there must be some 
real number x such that x 3 + 3x - 4 = 0. The basic 
observation is that when x is a large positive number 
or a highly negative number, the term x 3 is much bigger 
(in absolute value) than the other terms of the polyno- 
mial. Since this top term is an odd power of x, we have 
/(x) > 0 for some positive number x and /(x) < 0 for 
some negative number x. If / were never equal to zero, 
then it would be a continuous mapping from the real 
line into the real line minus the origin. But the real line 
is connected, while the real line minus the origin has 
two connected components, the positive and negative 
numbers. It is easy to show that a continuous map from 
a connected space X to another space Y must map X 
into just one connected component of Y: in our case, 
this contradicts the faet that / takes both positive and 
negative values. Therefore / must be equal to zero at 
some point, and the proof is complete. 


B A 

(a) 

Figure 1 Intersection numbers: 
(a) A ■ B = 1; (b) A ■ C = -1. 




This argument can be phrased in terms of the “inter- 
mediate value theorem” of calculus, which is indeed 
one of the most basic topological theorems. An equiv- 
alent reformulation of this theorem States that a con- 
tinuous curve that goes from the lower half-plane to 
the upper half-plane must cross the horizontal axis at 
some point. This idea leads to intersection numbers, 
one of the most useful concepts in topology. Let M 
be a smooth oriented manifold. (Roughly speaking, a 
manifold is oriented if you cannot continuously slide 
a shape about inside it and end up with a reflection 
of that shape. The simplest nonoriented manifold is a 
Mobius strip: to reflect a shape, slide it around the strip 
an odd number of times.) Let A and B be two closed 
oriented submanifolds of M with dimensions adding 
up to the dimension of M. Finally, suppose that A and 
B intersect transversely, so that their intersection has 
the “correct” dimension, namely 0, and is therefore a 
collection of separated points. 

Now let p be one of these points. There is a way of 
assigning a weight of +1 or -1 to p, which depends 
in a natural way on the relationship between the ori- 
entations of A, B, and M (see figure 1). For example, 
if M is a sphere, A is the equator of M, B is a closed 
curve, and appropriate directions are given to A and 
B, then the weight of p will tell you whether B crosses 
A upwards or downwards at p. If A and B intersect in 
only finitely many points, then we can define the inter- 
section number of A and B, written A ■ B, to be the sum 
of the weights (+1 or -1) at all the intersection points. 
In particular, this will happen if M is compact [III.9] 
(that is, we can think of it as a closed bounded subset 
of M' v for some N). 

The important point about the intersection number 
is that it is an invariant, in the following sense: if you 
move A and B about in a continuous way, ending up 
with another pair of transverse submanifolds A' and 
B', then the intersection number A' ■ B' is the same as 
A ■ B, even though the number of intersection points 
can change. To see why this might be true, consider 
again the case where A and B are curves and M is two 
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Figure 3 A surface bounded by a knot. 

dimensional: if A and B meet at a point with weight 
1, we can wiggle one of them to turn that point into 
three points with weights 1,-1, and 1, but the total 
contribution to the intersection number is unchanged. 
This is illustrated in figure 2. As a result, the intersec- 
tion number A ■ B is defined for any two submanifolds 
of complementary dimension: if they do not intersect 
transversely, one can move them until they do and use 
the definition we have just given. 

Inparticular, if two submanifolds have nonzero inter- 
section number, then they can never be moved to be dis- 
joint from each other. This is another way to describe 
the earlier arguments about connectedness. It is easy 
to write down one c urve from New York to Montevideo 
whose intersection number with the equator is equal to 
1. Therefore, no matter how we move that curve (pro- 
vided that we keep the endpoints fixed: more generally, 
if either A or B has a boundary, then that boundary 
should be kept fixed), its intersection number with the 
equator will always be 1, and in particular it must meet 
the equator in at least one point. 

One of many applications of intersection numbers in 
topology is the idea of linking numbers, which comes 
from knot theory [III.46]. A knot is a path in space 
that begins and ends at the same point, or, more for- 
mally, a closed connected one-dimensional submani- 
fold of M 3 . Given any knot K, it is always possible to 
find a surface S in M 3 with K as its boundary (see fig- 



Figure 4 Multiplication in the fundamental 
group and in higher homotopy groups. 


ure 3). Now let L be a knot that is disjoint from K. The 
linking number of K with L is defined to be the inter- 
section number of L with the surface 5. The properties 
of intersection numbers imply that if the linking num- 
ber of K with L is nonzero, then the knots K and L are 
“linked,” in the sense that it is impossible to pull them 
apart. 

2 Homotopy Groups 

If we remove the origin from the plane IR 2 , then we 
obtain a new space that is different from the plane in a 
fundamental way: it has a hole in it. However, we cannot 
detect this difference by counting components, since 
both the plane and the plane without the origin are con- 
nected. We begin this section by defming an invariant 
called the fundamental group, which does detect this 
kind of hole. 

As a first approximation, one could say that the ele- 
ments of the fundamental group of a space X are loops, 
which can be formally defined as continuous functions 
/ from [0,1] to X such that /(O) = /( 1). However, 
this is not quite accurate, for two reasons. The first 
reason, which is extremely important, is that two loops 
are regarded as equivalent if one can be continuously 
deformed to the other while all the time staying inside 
X. If this is the case, we say that they are homotopic. To 
be more formal about this, let us suppose that fo and 
/i are two loops. Then a homotopy between fo and f\ 
is a collection of loops f s in X, one for each s between 
0 and 1, such that the function F (s, t) = f s (t) is a con- 
tinuous function from [0, l] 2 to X. Thus, as 5 increases 
from 0 to 1, the loop f s moves continuously from fo to 
/i. If two loops are homotopic, then we count them as 
the same. So the elements of the homotopy group are 
not actually loops but equivalence classes, or homotopy 
classes, of loops. 

Even this is not quite correct, because for technical 
reasons we need to impose an extra condition on our 
loops: that they all start from (and therefore end at) 
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a given point, called the base point. If X is connected, 
it turns out not to matter what this base point is, but 
we need it to be the same for all loops. The reason for 
this is that it gives us a way to multiply two loops: if x 
is the base point and A and B are two loops that start 
and end at x, then we can define a new loop by going 
around A and then going around B. This is illustrated 
in figure 4. We regard this new loop as the product of 
the loops A and B. It is not hard to check that the homo- 
topy class of this product depends only on the homo- 
topy classes of A and B, and that the resulting binary 
operation turns the set of homotopy classes of loops 
into a group [1.3 §2.1]. It is this group that we call the 
fundamental group of X. It is denoted ny (X). 

The fundamental group canbe computed for most of 
the spaces we are likely to encounter. This makes it an 
important way to distinguish one space from another. 
First of all, for any n the fundamental group of R n is the 
trivial group with just one element, because any loop in 
R” can be continuously shrunk to its base point. On the 
other hånd, the fundamental group of R 2 \ {0} , the plane 
with the origin removed, is isomorphic to the group Z 
of the integers. This tells us that we can associate with 
any loop in R 2 \ {0} an integer that does not change 
if we modify the loop in a continuous way. This inte- 
ger is known as the winding number. Intuitively, the 
winding number measures the total number of times 
that the mapping goes around the origin, with coun- 
terclockwise circuits counting positively and clockwise 
ones negatively. Since the fundamental group of R 2 \ {0} 
is not the trivial group, R 2 \ {0} cannot be homeomor- 
phic to the plane. (It is an interesting exercise to try to 
find an elementary proof of this result — that is, a proof 
that does not use, or implicitly reconstruct, any of the 
machinery of algebraic topology. Such proofs do exist, 
but it is tricky to find them.) 

A classic application of the fundamental group is 
to prove THE FUNDAMENTAL THEOREM OF ALGEBRA 
[V.15], which States that every nonconstant polyno- 
mial with complex coefficients has a complex root. (The 
proof is sketched in the article just cited, though the 
fundamental group is not explicitly mentioned there.) 

The fundamental group tells us about the number 
of “one-dimensional holes” that a space has. A basic 
example is given by the circle, which has fundamental 
group z, just as R 2 \ {0} does, and for essentially the 
same reason: given a path in the circle that begins and 
ends at the same point, we can see how many times it 
goes around the circle. In the next section we shall see 
some more examples. 


Before we think about higher-dimensional holes, we 
first need to discuss one of the most important topolog- 
ical spaces: the n-dimensional sphere. For any natural 
number n, this is defined to be the set of points in R n+1 
at distance 1 from the origin. It is denoted S n . Thus, the 
0-sphere 5° consists of two points, the 1-sphere S 1 is 
the circle, and the 2-sphere S 2 is the usual sphere, like 
the surface of Earth. Higher-dimensional spheres take a 
little bit of getting used to, but we can work with them 
in the same way that we can with lower-dimensional 
spheres. For example, we can construct the 2-sphere 
from a closed two-dimensional disk by identifying all 
the points on the boundary circle with each other. In 
the same way, the 3 -sphere can be obtained from a solid 
three-dimensional hall by identifying all the points on 
the boundary 2-sphere. A related picture is to think 
of the 3-sphere as being obtained from our familiar 
three-dimensional space R 3 by adding one point “at 
infmity.” 

Now let us think about the familiar sphere S 2 . This 
has trivial fundamental group, since any loop drawn 
on the sphere can be shrunk to a point. However, this 
does not mean that the topology of S 2 is trivial. It just 
means that in order to detect its interesting properties 
we need a different invariant. And it is possible to base 
such an invariant on the observation that even if loops 
can always be shrunk, there are other maps that cannot. 
Indeed, the sphere itself cannot be shrunk to a point. 
To say this more formally, the identity map from the 
sphere to itself is not homotopic to a map from the 
sphere to just one point. 

This idea leads to the notion of higher-dimensional 
homotopy groups of a topological space X. The rough 
idea is to measure the number of “n-dimensional holes” 
in X, for any natural number n, by considering all the 
continuous mappings from the n-sphere to X. We want 
to see whether any of these spheres wrap around a hole 
in X. Once again, we consider two mappings from S n 
to X to be equivalent if they are homotopic. And the 
elements of the nth homotopy group Tr n (X) are again 
defined to be the homotopy classes of these mappings. 

Let / be a continuous map from [0, 1] to X with 
/(0) = /( 1) = x. If we like we can turn the interval 
[0,1] into the circle S 1 by “identifying” the points 0 
and 1: then / becomes a map from S 1 to X, with one 
specified point in S 1 mapping to x. In order to be able to 
define a group operation for mappings from a higher- 
dimensional S n , we similarly fix a point 5 in S n and a 
base point x in X and look just at maps that send 5 
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Let A and B be two continuous mappings from S n to 
X with this property. The “product” mapping A ■ B from 
S n to X is defined as follows. First “pinch” the equator 
of S n down to a point. When n = 1, the equator con- 
sists of just two points and the result is a figure eight. 
Similarly, for general n, we end up with two copies of 
S n that touch each other, one made out of the northern 
hemisphere and one out of the Southern hemisphere of 
the original unpinched copy of S n . We now use the map 
A to map the bottom half into X and the map B to map 
the top half into X, with the equator mapping to the 
base point x. (For both halves, the pinched equator is 
playing the part of the point s.) 

As in the one-dimensional case, this operation makes 
the set TT n (X) into a group, and this group is the nth 
homotopy group of the space X. One can think of it 
as measuring how many “n-dimensional holes” a space 
has. 

These groups are the beginning of “algebraic” topol- 
ogy: starting from any topological space, we construct 
an algebraic object, in this case a group. If two spaces 
are homeomorphic, then their fundamental groups 
(and higher homotopy groups) must be isomorphic. 
This is richer than the original idea of just measur- 
ing the number of holes, since a group contains more 
information than just a number. 

Any continuous function from S n into R m can be con- 
tinuously shrunk to a point in a straightforward way. 
This shows that all the higher homotopy groups of IR m 
are also trivial, which is a precise formulation of the 
vague idea that IR” 1 has no holes. 

Under certain circumstances one can show that two 
different topological spaces X and Y must have the 
same number of holes of all types. This is clearly true if 
X and Y are homeomorphic, but it is also true if X and 
Y are equivalent in a weaker sense, known as homotopy 
equivalence. Let X and Y be topological spaces and let 
fo and /i be continuous maps from X to Y. A homo- 
topy from fo to /i is defined more or less as it was for 
spheres: it is a continuous family of continuous maps 
from X to Y that starts with fo and ends with j \ . As 
then, if such a homotopy exists, we say that fo and f\ 
are homotopic. Next, a homotopy equivalence from a 
space X to a space Y is a continuous map f \ X — Y 
such that there is another continuous map g : Y -> X 
with the property that the composition g° f :X ^ X is 
homotopic to the identity map on X, and / °g :Y — Y 
is homotopic to the identity map on Y. (Notice that if we 
replaced the word “homotopic” with “equal,” we would 
obtain the definition of a homeomorphism.) When there 



Figure 5 Some spaces that are 
homotopy equivalent to the circle. 


is a homotopy equivalence from X to Y, we say that X 
and Y are homotopy equivalent, and also that X and Y 
have the same homotopy type. 

A good example is when X is the unit circle and Y 
is the plane with the origin removed. We have aheady 
observed that these have the same fundamental group, 
and commented that it was “for essentially the same 
reason.” Now we can be more precise. Let / : X — Y 
be the map that takes ( x,y ) to (x, y) (where the first 
(x,y) belongs to the circle and the second to the plane). 
Let g : Y — X be the map that takes (u,v) to 



(Note that u 2 + v 2 is never zero because the origin is not 
contained in Y.) Then g ° / is easily seen to equal the 
identity on the unit circle, so it is certainly homotopic to 
the identity. As for / o g , it is given by the same f ormula 
as g itself. More geometrically, it takes the points on 
each radial line to the point where that line intersects 
the unit circle. It is not hard to show that this map is 
homotopic to the identity on Y. (The basic idea is to 
“shrink the radial lines down” to the points where they 
intersect the circle.) 

Very roughly speaking, two spaces are homotopy 
equivalent if they have the same number of holes of 
all types. This is a more flexible notion of “having the 
same shape” than the notion of homeomorphism. For 
example, Euclidean spaces of different dimensions are 
not homeomorphic to each other, but they are all homo- 
topy equivalent. Indeed, they are all homotopy equiva- 
lent to a point: such spaces are called contractible, and 
one thinks of them as the spaces that have no hole of 
any sort. The circle is not contractible, but it is homo- 
topy equivalent to many other natural spaces: the plane 
M 2 minus the origin (as we have seen), the cylinder 
S 1 x R, the compact cylinder S 1 x [0,1], and even the 
Mobius strip (see figure 5 ). Most invariants in algebraic 
topology (such as homotopy groups and cohomology 
groups) are the same for any two spaces that are homo- 
topy equivalent. Thus, knowing that the fundamental 
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group of the circle is isomorphic to the integers tells us 
that the same is true for the various homotopy equiva- 
lent spaces just mentioned. Roughly speaking, this says 
that all these spaces have “one basic one-dimensional 
hole.” 

3 Calculations of the Fundamental Group 
and Higher Homotopy Groups 

To give some more feeling for the fundamental group, 
let us review what we already know and look at a 
few more examples. The fundamental group of the 2- 
sphere, or indeed of any higher-dimensional sphere, is 
trivial. The two-dimensional torus S 1 x S 1 has funda- 
mental group Z 2 = ZxZ. Thus, a loop in the torus deter- 
mines two integers, which measure how many times it 
winds around in the meridian direction and how many 
in the longitudinal direction. 

The fundamental group can also be non-Abelian; that 
is, we can have ab * ha for some elements a and b 
of the fundamental group. The simplest example is a 
space X built out of two circles that meet at a sin- 
gle point (see figure 6). The fundamental group of X 
is the free group [IV. 10 §2] on two generators a and 
b. Roughly speaking, an element of this group is any 
product you can write down using the generators and 
their inverses, such as abaab~ l a, except that if a and 
a -1 or b and b _1 appear next to each other, you cancel 
them first. (So instead of abb~ ] bab~ ] one would sim- 
ply write ab ab 1 , for example.) The generators corre- 
spond to loops around each of the two circles. The free 
group is in a sense the most highly non-Abelian group. 
In particular, ab is not equal to ha, which in topolog- 
ical terms tells us that going around loop a and then 
loop b in the space X is not homotopic to the loop that 
goes around loop b and then loop a. 

This space may seem somewhat artificial, but it is 
homotopy equivalent to the plane with two points 
removed, which appears in many contexts. More gener- 
ally, the fundamental group of the plane with d points 
removed is the free group on d generators: this is a pre- 
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Figure 7 Proof that TT2 of any space is Abelian. 


cise sense in which the fundamental group measures 
the number of holes. 

In contrast with the fundamental group, the higher 
homotopy groups n n (X) are Abelian when n is at least 
2. Figure 7 gives a “proof without words” in the case 
n = 2, the proof being the same for any larger n. In 
the figure, we view the 2-sphere as the square with its 
boundary identified to a point. So any elements A and 
B of tt 2 (X) are represented by continuous maps of the 
square to X that map the boundary of the square to 
the base point x. The figure exhibits (several steps of) 
a homotopy from AB to BA, with the shaded regions 
and the boundary of the square all mapping to the 
base point x. The picture is reminiscent of the sim- 
plest nontrivial braid, in which one string is twisted 
around another; this is the beginning of a deep con- 
nection between algebraic topology and braid groups 
[IH.4]. 

The fundamental group is especially powerful in low 
dimensions. For example, every compact connected 
surface (or two-dimensional manifold) is homeomor- 
phic to one of those on a standard list (see differen- 
tial topology [IV.7 §2.3]), andwe compute that all the 
manifolds on this list have different (nonisomorphic) 
fundamental groups. So, when you capture a closed sur- 
face in the wild, computing its fundamental group tells 
you exactly where it fits in the classification. Moreover, 
the geometric properties of the surface are closely tied 
to its fundamental group. The surfaces with a rieman- 
NIAN METRIC [1.3 §6.10] Of positive CURVATURE [III.13] 
(the 2-sphere and real projective plane [1.3 §6.7]) are 
exactly the surfaces with finite fundamental group; the 
surfaces with a metric of curvature zero (the torus and 
Klein bottle) are exactly the surfaces with a fundamen- 
tal group that is infinite but “almost Abelian” (there is 
an Abelian subgroup of finite index); and the remaining 
surfaces, those that have a metric of negative curvature, 
have “highly non-Abelian” fundamental group, like the 
free group (see figure 8). 

After more than a century of studying three-dimen- 
sional manifolds, we now know, thanks to the advances 
of Thurston and Perelman, that the picture is almost 
the same for these as it is for 2-manifolds: the fun- 
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Sphere One-holed torus Two-holed torus 

Figure 8 A sphere, a torus, and a surface of genus 2. 


damental group Controls the geometric properties of 
the 3-manifold almost completely (see differential 
topology [IV. 7 §2.4]). But this is completely untrue for 
4-manifolds and in higher dimensions: there are many 
different simply connected manifolds, meaning mani- 
folds with trivial fundamental group, and we need more 
invariants to be able to distinguish between them. (To 
begin with, the 4-sphere S 4 and the product S 2 x 5 2 
are both simply connected. More generally, we can take 
the connected sum of any number of copies of S 2 x S 2 , 
obtained by removing 4-balls from these manifolds and 
identifying the boundary 3-spheres. These 4-manifolds 
are all simply connected, and yet no two of them are 
homeomorphic or even homotopy equivalent.) 

An obvious approach to distinguishing different 
spaces would be to use higher homotopy groups, and 
indeed this works in simple cases. For example, tt 2 of 
the connected sum of r copies of S 2 x S 2 is isomor- 
phic to 7? r . Also, we can show that the sphere S n of 
any dimension is not contractible (although it is simply 
connected for n ^ 2) by computing that n n (S n ) is iso- 
morphic to the integers (rather than the trivial group). 
Thus, each continuous map from the n-sphere to itself 
determines an integer, called the degree of the map, 
which generalizes the notion of winding number for 
maps from the circle to itself. 

In general, however, the homotopy groups are not a 
practical way of distinguishing one space from another, 
because they are amazingly hard to compute. A first 
hint of this was Hopf s 1931 discovery that tt 3 (S 2 ) 
is isomorphic to the integers: it is clear that the 2- 
sphere has a two-dimensional hole, as measured by 
tt 2 {S 2 ) = z, but in what sense does it have a three- 
dimensional hole? This does not correspond to our 
naive view of what such a hole should be. The problem 
of computing the homotopy groups of spheres turns 
out to be one of the hårdest in all of mathematics: 
some of what we know is shown in table 1, but despite 
massive efforts the homotopy groups tti(S 2), for exam- 
ple, are known only for i ^ 64. There are tantalizing 
patterns in these calculations, with a number-theoretic 
flavor, but it seems impossible to formulate a precise 


guess for the homotopy groups of spheres in general. 
And computing the homotopy groups for spaces more 
complex than spheres is even more complicated. 

To get an idea of the difficulties involved, let us define 
the so-called Hopfmap from S 3 to S 2 , which turns out 
to represent a nonzero element of nslS 2 ). There are 
in faet several equivalent definitions. One of them is to 
regard a point (xi , %2 , X3 , X4 ) in S 3 as a pair of complex 
numbers (zi,Z2) such that | zi | 2 + |Z2 1 2 = 1. This we 
do by setting zi = x\ + 1x2 and Z2 = x 3 + iX4. We then 
map the pair (zi,Z2) to the complex number Z1/Z2. 
This may not look like a map to S 2 , but it is because 
Z2 may be zero, so in faet the image of the map is not C 
but the Riemann sphere C u 00, which can be identified 
with S 2 in a natural way. 

Another way of deflning the Hopf map is to regard 
points (xj , X2, x 3 , X4 ) in S 3 as unit quaternions. In the 
article on quaternions in this volume [III. 78], it is shown 
that each unit quaternion canhe associated with a rota- 
tion of the sphere. If we fix some point s in the sphere 
and map each unit quaternion to the image of s under 
the associated rotation, then we get a map from S 3 to 
S 2 that is homotopic to the map defined in the previous 
paragraph. 

The Hopf map is an important construction, and will 
reappear more than once later in this article. 

4 Homology Groups and 
the Cohomology Ring 

Homotopy groups, then, can be rather mysterious and 
very hard to calculate. Fortunately, there is a different 
way to measure the number of holes in a topological 
space: homology and cohomology groups. The defini- 
tions are more subtle than the definition of homotopy 
groups, hut the groups turn out to be easier to compute 
and are for this reason mueh more commonly used. 

Recall that elements of the nth homotopy group 
n n (X) of a topological space X are represented by 
continuous maps from the n-sphere to X. Let X be a 
manifold, for simplicity. There are two key differences 
between homotopy groups and homology groups. The 
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Table 1 The first few homotopy groups of spheres. 



s 1 

s 2 

S 3 

S 4 

S 5 

5 6 

S 7 

S 8 

S 9 

m 

Z 

0 

0 

0 

0 

0 

0 

0 

0 

7T2 

0 

z 

0 

0 

0 

0 

0 

0 

0 

TT 3 

0 

z 

z 

0 

0 

0 

0 

0 

0 

7T 4 

0 

Z/2 

Z/2 

z 

0 

0 

0 

0 

0 

m 

0 

Z/2 

Z/2 

Z/2 

z 

0 

0 

0 

0 

n*6 

0 

Z/4 x Z/3 

Z/4 x Z/3 

Z/2 

Z/2 

z 

0 

0 

0 

u 7 

0 

Z/2 

Z/2 

Z x Z/4 x Z/3 

Z/2 

Z/2 

z 

0 

0 

tt 8 

0 

Z/2 

Z/2 

Z/2 x Z/2 

Z/8 x Z/3 

Z/2 

Z/2 

z 

0 

7Tg 

0 

Z/3 

Z/3 

Z/2 x Z/2 

Z/2 

Z/8 x Z/3 

Z/2 

Z/2 

z 

mo 

0 

Z/3 x Z/5 

Z/3 x Z/5 

Z/8 x Z/3 x Z/3 

Z/2 

0 

Z/8 x Z/3 

Z/2 

Z/2 


first is that the basic objects of homology are more 
general than n-dimensional spheres: every closed ori- 
ented n-dimensional submanifold A of X determines 
an element of the nth homology group of X, H n (X). 
This might make homology groups seem much big- 
ger than homotopy groups, but that is not the case, 
because of the second major difference between homo- 
topy and homology. As with homotopy, the elements of 
the homology groups are not the submanifolds them- 
selves but equivalence classes of submanifolds, but 
the definition of the equivalence relation for homol- 
ogy makes it much easier for two of these submani- 
folds to be equivalent than it is for two spheres to be 
homotopic. 

We shall not give a formal definition of homology, but 
here are some examples that convey some of its flavor. 
Let X be the plane with the origin removed and let A be 
a circle that goes around the origin. If we continuously 
deform this circle, we will obtain a new curve that is 
homotopic to the original circle, but with homology we 
can do more. For instance, we can start with a continu- 
ous deformation that causes two of its points to touch 
and turns it into a figure eight. One half of this figure 
eight will have to contain the origin, but we can leave 
that still and slide the other part away. The result is 
then two closed curves, with the origin inside one and 
outside the other. This pair of curves, which together 
form a 1-manifold with two components, is equivalent 
to the original circle. It can be seen as a continuous 
deformation of a more general kind. 

A second example shows how natural it is to include 
other manifolds in the definition of homology. This 
time let X be R 3 with a circle removed, and let A be a 
sphere that contains the circle in its interior. Suppose 
that the circle is in the AY-plane and that both it and 
the sphere A are centered at the origin. Then we can 



Figure 9 The circle A represents zero 
in the homology of the surface. 


pinch the top and bottom of A toward the origin until 
they just touch. If we do so, then we obtain a shape 
that looks like a torus, except that the hole in the mid- 
dle has been shrunk to zero. But we can open up this 
hole with the help of a further continuous deformation 
and obtain a genuine torus, which is a “tube” around 
the original circle. From the point of view of homology, 
this torus is equivalent to the sphere A. 

A more general rule is that if X is a manifold and B is 
a compact oriented (n+ 1) -dimensional submanifold 
of X with a boundary, then this boundary 3 B will be 
equivalent to zero (which is the same as saying that 
[35] = 0 in H n (X)): see figure 9. 

The group operation is easy to define: if A and B are 
two disjoint submanifolds of X, giving rise to homol- 
ogy classes [A] and [B], then [A] + [B] is the homol- 
ogy class of [A u B]. (More generally, the definition of 
homology allows us to add up any collection of sub- 
manifolds, whether or not they overlap.) Here are some 
simple examples of homology groups, which, unlike 
the fundamental group, are always Abelian. The homol- 
ogy groups of a sphere, Hi(S n ), are isomorphic to the 
integers Z for i = 0 and for i = n, and 0 otherwise. 
This contrasts with the complicated homotopy groups 
of the sphere, and better reflects the naive idea that 
the n-sphere has one n-dimensional hole and no other 
holes. Note that the fundamental group of the circle, 
the group of integers, is the same as its first homology 
group. More generally, for any path-connected space, 


W.6. Algebraic Topology 


77 


the first homology group is always the “Abelianization” 
of the fundamental group (which is formally defined to 
be its largest Abelian quotient). For example, the funda- 
mental group of the plane with two points removed is 
the free group on two generators, while the first homol- 
ogy group is the free Abelian group on two generators, 
or Z 2 . 

The homology groups of the two-dimensional torus 
Hi^xS 1 ) are isomorphic to Z for i = 0, toZ 2 for i = 1, 
and to Z for i = 2. All of this has geometric meaning. 
The zeroth homology group of any space is isomorphic 
to Z r for a space X with r connected components. So 
the faet that the zeroth homology group of the torus is 
isomorphic to Z means that the torus is connected. Any 
closed loop in the torus determines an element of the 
first homology group Z 2 , which measures how many 
times the loop winds around the meridian and longitu- 
dinal directions of the torus. And finally, the homology 
of the torus in dimension 2 is isomorphic to Z because 
the torus is a closed orientable manifold. That tells us 
that the whole torus defines an element of the second 
homology group of the torus, which is in faet a gen- 
erator of that group. By contrast, the homotopy group 
tt 2 (S 1 x S 1 ) is the trivial group: there are no interest- 
ing maps from the 2-sphere to the 2-torus, but homol- 
ogy shows that there are interesting maps from other 
closed 2-manifolds to the 2-torus. 

As we have mentioned, calculating homology groups 
is mueh easier than calculating homotopy groups. The 
main reason for this is the existence of results that tell 
you the homology groups of a space that is built up 
from smaller pieces in terms of the homology groups of 
those pieces and their intersections. Another important 
property of homology groups is that they are “funeto- 
rial” in the sense that a continuous map / from a space 
X to a space Y leads in a natural way to a homomor- 
phism /* from H t (X) to Hi(Y) for each i: /*([A]) is 
defined to be [/(A)]. In other words, /*([A]) is the 
equivalence class of the image of A under /. 

We can define the closely related idea of “cohom- 
ology” simply by a different numbering. Let X be 
a closed oriented n-dimensional manifold. Then we 
define the ith cohomology group H l (X) to be the 
homology group H n -i(X). Thus, one way to write down 
a cohomology class (an element of H i (X)) is by choos- 
ing a closed oriented submanifold S of codimension i 
in X. (This means that the dimension of 5 is n - i.) We 
write [S] for the corresponding cohomology class. 

For more general spaces than manifolds, cohomology 
is not just a simple renumbering of homology. Infor- 


mally, if X is a topological space, then we think of an 
element of H l (X ) as being represented by a codimen- 
sion-i subspace of X that can move around freely in 
X. For example, suppose that / is a continuous map 
from X to an i-dimensional manifold. If X is a manifold 
and / is sufficiently “well-behaved,” then the inverse 
image of a “typical” point in the manifold will be an i- 
codimensional submanifold of X, and as we move the 
point about, this submanifold will vary continuously, 
and will do so in a way that is similar to the way that a 
circle became two circles and a sphere became a torus 
earlier. If X is a more general topological space, the map 
/ still determines a cohomology class in HHX), which 
we think of as being represented by the inverse image 
in X of any point in the manifold. 

Flowever, even when X is an oriented n-dimensional 
manifold, cohomology has distinet advantages over 
homology. This may seem odd, since the cohomology 
groups are the homology groups with different names. 
However, this renumbering allows us to give very useful 
extra algebraic structure to the cohomology groups of 
X: not only can we add cohomology classes, we can mul- 
tiply them as well. Furthermore, we can do so in such a 
way that, taken together, the cohomology groups of X 
form a ring [III.83 §1]. (Of course, we could do this for 
the homology groups, but the cohomology groups form 
a so-called graded ring. In particular, if [A] g HHX) 
and [B] g W(X), then [A] ■ [B] g H i+ HX).) 

The multiplication of cohomology classes has a rich 
geometric meaning, especially on manifolds: it is given 
by the intersection of two submanifolds. This gener- 
alizes our discussion of intersection numbers in sec- 
tion 1: there we considered zero-dimensional intersec- 
tions of submanifolds, whereas we are now considering 
(cohomology classes of) higher-dimensional intersec- 
tions. To be precise, let S and T be closed oriented sub- 
manifolds of X, of codimension i and j, respectively. 
By moving S slightly (which does not change its class 
in W(X)) we can assume that S and T intersect trans- 
versely, which implies that the intersection of S and 
T is a smooth submanifold of codimension i + j in X. 
Then the product of the cohomology classes [S] and 
[T] is simply the cohomology class of the intersection 
S n T in H i+ i(X). (In addition, the submanifold S n T 
inherits an orientation from S, T, and X: this is needed 
to define the associated cohomology class.) 

As a result, to compute the cohomology ring of a 
manifold, it is enough to specify a basis for the cohom- 
ology groups (which, as we have already discussed, are 
relatively easy to determine) using some submanifolds 
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Figure 10 A 2 = A ■ A' = 0, A ■ B = [point], 
and B 2 = B ■ B' = 0. 


and to see how these submanifolds intersect. For exam- 
ple, we can compute the cohomology ring of the 2- 
torus as shown in figure 10. For another example, it 
is not hard to show that the cohomology of the com- 
plex projective plane [III. 74] CP 2 has a basis given 
by three basic submanifolds: a point, which belongs 
to H 4 (CP 2 ) hecause it is a submanifold of codimen- 
sion 4; a complex projective line CP 1 = S 2 , which 
belongs to H 2 (CP 2 ) ; and the whole manifold CP 2 , which 
is in H° (CP 2 ) and represents the identity element 1 of 
the cohomology ring. The product in the cohomology 
ring is described by saying that [CP 1 ] [CP 1 ] = [point], 
because any two distinet lines CP 1 in the plane meet 
transversely in a single point. 

This calculation of the cohomology ring of the com- 
plex projective plane, although very simple, has several 
strong consequences. First of all, it implies Bézout’s 
theorem on intersections of complex algebraic curves 
(see algebraic geometry [IV.4 §6]). An algebraic curve 
of degree d in CP 2 represents d times the class of a line 
CP 1 in H 2 (CP 2 ) . Therefore, if two algebraic curves D 
and E of degrees d and e meet transversely, then the 
cohomology class [D n E] equals 

[D] ■ [£] = (d[CP 1 ])(e[CP 1 ]) = de[point]. 

For complex submanifolds of a complex manifold, 
intersection numbers are always +1, not -1, and so this 
means that D and E meet in exaetly de points. 

We can also use the computation of the cohomology 
ring of CP 2 to prove something about the homotopy 
groups of spheres. It tums out that CP 2 can be con- 
structed as the union of the 2-sphere and the closed 
four-dimensional ball, with each point of the boundary 
S 3 of the ball identified with a point in S 2 by the Hopf 
map, which was defined in the previous section. 

A constant map from one space to another, or a map 
homotopic to a constant map, gives rise to the zero 
homomorphism between the homology groups Hi, at 
least when i > 0. The Hopf map / : S 3 -*• S 2 also 
induces the zero homomorphism because the nonzero 
homology groups of S 3 and S 2 are in different dimen- 



sions. Nonetheless, we will show that / is not homo- 
topic to the constant map. If it were, then the space 
CP 2 obtained by attaching a 4-ball to the 2-sphere using 
the map / would be homotopy equivalent to the space 
obtained by attaching a 4-ball to the 2-sphere using a 
constant map. The latter space Y is the union of S 2 and 
S 4 identified at one point. But in faet Y is not homotopy 
equivalent to the complex projective plane, because 
their cohomology rings are not isomorphic. In partic- 
ular, the product of any element of H 2 ( Y ) with itself is 
zero, unlike what happens in CP 2 where [CP 1 ][CP 1 ] = 
[point] . Therefore / is nonzero in tts ( S 2 ) . A more care- 
ful version of this argument shows that ttj, (S 2 ) is iso- 
morphic to the integers, and the Hopf map / : S 3 — S 2 
is a generator of this group. 

This argument shows some of the rich relations 
between all the basic concepts of algebraic topology: 
homotopy groups, cohomology rings, manifolds, and 
so on. To conclude, here is a way to visualize the non- 
triviality of the Hopf map f : S 3 — S 2 . Look at the sub- 
set of S 3 that maps to any given point of the 2-sphere. 
These inverse images are all circles in the 3-sphere. To 
draw them, we can use the faet that S 3 minus a point 
is homeomorphic to M 3 ; so these inverse images form a 
family of disjoint circles that filis up three-dimensional 
space, with one circle being drawn as a line (the circle 
through the point we removed from S 3 ). The striking 
feature of this picture is that any two of this huge fam- 
ily of circles have linking number 1 with each other: 
there is no way to pull any two of them apart (see 
figure 11 ). 
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5 Vector Bundles and Characteristic Classes 

We now introduce another major topological idea: fiber 
bundles. If E and B are topological spaces, x is a point 
in B, and p : E — B is a continuous map, then the fiber 
of p over x is the subspace of E that maps to x. We say 
that p is a fiber bundle, with fiber F, if every fiber of p is 
homeomorphic to the same space F. We call B the base 
space and E the total space. For example, any product 
space B x F is a fiber bundle over B, cailed the trivial F- 
bundle over B. (The continuous map in this case is the 
map that takes (x, y) to x.) But there are many nontriv- 
ial fiber bundles. For example, the Mobius strip is a fiber 
bundle over the circle with fiber a closed interval. This 
example helps to explain the old name “twisted prod- 
uct” for fiber bundles. Another example: the Hopf map 
makes the 3-sphere the total space of a circle bundle 
over the 2-sphere. 

Fiber bundles are a fundamental way to build up com- 
plicated spaces from simple pieces. We will focus on the 
most important special case: vector bundles. A vector 
bundle over a space B is a fiber bundle p :E — B whose 
fibers are all real vector spaces of some dimension n. 
This dimension is cailed the rank of the vector bun- 
dle. A line bundle means a vector hundle of rank 1; for 
example, we can view the Mobius strip (not including 
its boundary) as a line bundle over the circle S 1 . It is a 
nontrivial line bundle; that is, it is not isomorphic to the 
trivial line bundle S 1 x R. (There are many ways of con- 
structingit: oneis to take the strip {(x,y) : 0 < x < 1} 
and identify each point (0 ,y) with the point (1 ,—y). 
The base space of this line bundle is the set of all points 
(x, 0), which is a circle since (0, 0) and (1, 0) have been 
identified.) 

If M is a smooth manifold of dimension n, its tangent 
bundle TM — ■ M is a vector bundle of rank n. We can 
easily define this bundle by considering M as a sub- 
manifold of some Euclidean space R N . (Every smooth 
manifold canbe embedded into Euclidean space.) Then 
TM is the subspace of Mx Wt N of pairs (x, v ) such that 
the vector v is tangent to M at the point x; the map 
TM M sends a pair (x,v) to the point x. The fiber 
over x then has the form of the set of all pairs {x,v) 
with v belonging to an affme subspace of R ,v of dimen- 
sion equal to that of M. For any fiber bundle, a section 
means a continuous map from the base space B to the 
total space E that maps each point x in B to some point 
in the fiber over x. A section of the tangent bundle of 
a manifold is cailed a vector field. We can draw a vector 



Figure 12 Trivializations of the tangent 
bundle for the circle and the torus. 


Figure 13 The hairy hall theorem. 

field on a given manifold by putting an arrow (possibly 
of zero length) at every point of the manifold. 

In order to classify smooth manifolds, it is impor- 
tant to study their tangent bundles, and in particular 
to see whether they are trivial or not. Some manifolds, 
like the circle S 1 and the torus S 1 x S 1 , do have trivial 
tangent hundle. The tangent bundle of an n-manifold 
M is trivial if and only if we can find n vector fields that 
are linearly independent at every point of M. So we can 
prove that the tangent bundle is trivial just by writing 
down such vector fields; see figure 12 for the circle or 
the torus. But how can we show that the tangent bundle 
of a given manifold is nontrivial? 

One way is to use intersection numbers. Let Mbe a 
closed oriented n-manifold. We can identify M with the 
image of the “zero-section” inside the tangent bundle 
TM, the section that assigns to every point of M the 
zero vector at that point. Since the dimension of TM is 
precisely double that of M, the discussion of intersec- 
tion numbers in section 1 gives a well-defmed integer 
M 2 = M ■ M, the self-intersection number of M inside 
TM; this is cailed the Euler characteristic x(Mj. By the 
definition of intersection numbers, for any vector field 
v on M that meets the zero-section transversely, the 
Euler characteristic of M is equal to the number of zeros 
of v, counted with signs. 

As a result, if the Euler characteristic of M is not 
zero, then every vector field on M must meet the zero- 
section; in other words, every vector field on M must 
equal zero somewhere. The simplest example occurs 
when M is the 2-sphere S 2 . We can easily write down 
a vector field (for example, the one pointing toward 
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the east along circles of latitude, which vanishes at 
the north and south poles) whose intersection number 
with the zero-section is 2. Therefore the 2-sphere has 
Euler characteristic 2, and so every vector field on the 
2-sphere must vanish somewhere. This is a famous the- 
orem of topology known as the “hairy hall theorem”: 
it is impossible to comb the hair on a coconut (see 
figure 13). 

This is the beginning of the theory of characteristic 
classes, which measure how nontrivial a given vector 
bundle is. There is no need to restrict ourselves to the 
tangent bundle of a manifold. For any oriented vector 
bundle E of rank n on a topological space X, we can 
define a cohomology class x(E) in H n (X), the Euler 
class, which vanishes if the bundle is trivial. Intuitively, 
the Euler class of E is the cohomology class represented 
by the zero set of a general section of E, which (for 
example, if X is a manifold) should be a codimension- 
n submanifold of X, since X has codimension n in E. 
If X is a closed oriented n-manifold, then the Euler 
class of the tangent bundle in H n (X) = Z is the Euler 
characteristic of X. 

One inspiration for the theory of characteristic 
classes was the Gauss-Bonnet theorem, generalized to 
all dimensions in the 1940s. The theorem expresses the 
Euler characteristic of a closed manifold with a Rieman- 
nian metric as the integral over the manifold of a cer- 
tain curvature function. More broadly, a central goal 
of differential geometry is to understand how the geo- 
metric properties of a Riemannian manifold such as its 
curvature are related to the topology of the manifold. 

The characteristic classes for complex vector bundles 
(that is, bundles where the fibers are complex vector 
spaces) turn out to be particularly convenient: indeed, 
real vector bundles are often studied by constructing 
the associated complex vector bundle. If E is a com- 
plex vector bundle of rank n over a topological space X, 

the Chern classes of E are a sequence cj (F) c n (E) 

of cohomology classes on X, with a(E) belonging to 
H 2l (X), which all vanish if the bundle is trivial. The 
top Chern class, c n {E), is simply the Euler class of E: 
thus, it is the first obstruction to finding a section of 
E that is everywhere nonzero. The more general Chern 
classes have a similar interpretation. For any 1 ^ j ^ n, 
choose j general sections of E. The subset of X over 
which these sections become linearly dependent will 
have codimension 2 (n + 1 - j) (assuming, for example, 
that X is a manifold). The Chern class c n +i-j ( E ) is pre- 
cisely the cohomology class of this subset. Thus the 
Chern classes measure in a natural way the failure of a 


given complex vector bundle to be trivial. The Pontrya- 
gin classes of a real vector bundle are dehned to be the 
Chern classes of the associated complex vector bundle. 

A triumph of differential topology is Sullivan’s 1977 
theorem that there are only finitely many smooth 
closed simply connected manifolds of dimension at 
least 5 with any given homotopy type and given Pon- 
tryagin classes of the tangent bundle. This statement 
fails badly in dimension 4, as Donaldson discovered in 
the 1980s (see differential topology [IV.7 §2.5]). 

6 Ef -Theory and Generalized 
Cohomology Theories 

The effectiveness of vector bundles in geometry led to 
a new way of measuring the “holes” in a topological 
space X: looking at how many different vector bundles 
over X there are. This idea gives a simple way to define 
a cohomology-like ring associated to any space, known 
as If -theory (after the German word “Klasse,” since the 
theory involves equivalence classes of vector bundles). 
It turns out that K -theory gives a very useful new angle 
by which to look at topological spaces. Some problems 
that could be solved only with enormous effort using 
ordinary cohomology became easy with if-theory. The 
idea was created in algebraic geometry by Grothendieck 
in the 1950s and then brought into topology by Atiyah 
and Hirzebruch in the 1960s. 

The definition of If -theory can be given in a few lines. 
For a topological space X, we define an Abelian group 
K°(X), the K -theory of X, whose elements can be writ- 
ten as formal differences [E] - [F], where E and F are 
any two complex vector bundles over X. The only rela- 
tions we impose in this group are that [£®F] = [£] + [F] 
for any two vector bundles E and F over X. Here E ® F 
denotes the directsum of the two bundles; if E x and F x 
denote the fibers at a given point x in X, the fiber of 
E ® F at x is simply E x x F x . 

This simple definition leads to a rich theory. First of 
all, the Abelian group K° ( X ) is in faet a ring: we mul- 
tiply two vector bundles on X by forming the tensor 
product [III.91]. In this respect, if -theory behaves like 
ordinary cohomology. The analogy suggests that the 
group K°(X) should form part of a whole sequence of 
Abelian groups K l (X), for integers i, and indeed these 
groups can be defined. In particular, K~ l (X) can be 
defined as the subgroup of those elements of K° (S l xX) 
whose restriction to K° (point x X) is zero. 

Then a miracle occurs: the groups K i (X) turn out to 
be periodic of order 2: KUX) = if 1+2 (X) for all integers 
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i. This is a famous phenomenon known as Bott peri- 
odicity. So there are really only two different k-groups 
attached to any topological space: K°(X) and k 1 (X). 

This may suggest that k-theory contains less infor- 
mation than ordinary cohomology, but that is not so. 
Neither k-theory nor ordinary cohomology determines 
the other, although there are strong relations between 
them. Each brings different aspects of the shape of a 
space to the fore. Ordinary cohomology, with its num- 
bering, shows fairly directly the way a space is built 
up from pieces of different dimensions, k-theory, hav- 
ing only two different groups, looks cruder at first (and 
is often easier to compute as a result). But geometric 
problems involving vector bundles often involve infor- 
mation that is subtle and hard to extract from ordinary 
cohomology, whereas this information is brought to the 
surface by k-theory. 

The basic relation between k-theory and ordinary 
cohomology is that the group K°(X) constructed from 
the vector bundles on X “knows” something about all 
the even-dimensional cohomology groups of X. To be 
precise, the rank of the Abelian group k° (X) is the sum 
of the ranks of all the even-dimensional cohomology 
groups H 2l (X). This connection comes from associat- 
ing with a given vector bundle on X its Chern classes. 
The odd k-group k 1 (X) is related in the same way to 
the odd-dimensional ordinary cohomology. 

As we have already hinted, the precise group k°(k), 
as opposed to just its rank, is better adapted to some 
geometric problems than ordinary cohomology. This 
phenomenon shows the power of looking at geomet- 
ric problems in terms of vector bundles, and thus ulti- 
mately in terms of linear algebra. Among the classic 
applications of k-theory is the proof, by Bott, Ker- 
vaire, and Milnor, that the 0-sphere, the 1-sphere, the 
3-sphere, and the 7-sphere are the only spheres whose 
tangent bundles are trivial. This has a deep algebraic 
consequence, in the spirit of the fundamental theorem 
of algebra: the only dimensions in which there can be 
a real division algebra (not assumed to be commuta- 
tive or even associative) are 1, 2, 4, and 8. There are 
indeed division algebras of all four types: the real num- 
bers, complex numbers, quaternions, and octonions 
(see QUATERNIONS, OCTONIONS, AND NORMED DIVISION 
ALGEBRAS [III. 78]). 

Let us see why the existence of a real division alge- 
bra of dimension n implies that the (n - l)-sphere has 
trivial tangent bundle. In faet, let us merely assume that 
we have a finite-dimensional real vector space V with a 
bilinear map V x V — ■ V, which we call the “product,” 


such that if x and y are vectors in V with xy = 0, 
then either x = 0 or y = 0. For convenience, let us 
also assume that there is an identity element 1 in V, 
so 1 ■ x = x ■ 1 = x for all x g V\ one can, how- 
ever, do without this assumption. If V has dimension 
n, then we can identify V with R™. Then, for each point 
x in the sphere S” -1 , left multiplication by x gives a 
linear isomorphism from R™ to itself. By scaling the 
output to have length 1, left multiplication by x gives 
a diffeomorphism from S n_1 to itself which maps the 
point 1 (scaled to have length 1) to x. Taking the deriva- 
tive of this diffeomorphism at the point 1 gives a lin- 
ear isomorphism from the tangent space of the sphere 
at the point 1 to the tangent space at x. Since the 
point x on the sphere is arbitrary, a choice of basis for 
the tangent space of the sphere at the point 1 deter- 
mines a trivialization of the whole tangent bundle of 
the (n - 1) -sphere. 

Among other applications, k-theory provides the 
best “explanation” for the low-dimensional homotopy 
groups of spheres, and in particular for the number- 
theoretic patterns that are seen there. Notably, denom- 
inators of Bernoulli numbers appear among those 
groups (such as Tr n+3 (S n ) s Z/24 for n at least 5), and 
this pattern was explained using k-theory by Milnor, 
Kervaire, and Adams. 

THE ATIYAH-SINGER INDEX THEOREM [V.2] provides 
a deep analysis of linear differential equations on 
closed manifolds using k-theory. The theorem has 
made k-theory important for gauge theories and string 
theories in physics. k-theory can also be defined for 
noncommutative rings, and is in faet the central con- 
cept in “noncommutative geometry” (see operator 
ALGEBRAS [IV. 1 5 §5]). 

The success of k-theory led to a search for other 
“generalized cohomology theories.” There is one other 
theory that stands out for its power: complex cobor- 
dism. The definition is very geometric: the complex 
cobordism groups of a manifold M are generated by 
mappings of manifolds (with a complex structure on 
the tangent bundle) into M. The relations say that any 
manifold counts as zero if it is the boundary of some 
other manifold. For example, the union of two circles 
would count as zero if you could find a cylinder whose 
ends were those circles. 

It turns out that complex cobordism is mueh richer 
than either k-theory or ordinary cohomology. It sees 
far into the structure of a topological space, but at 
the cost of being difficult to compute. Over the past 
thirty years, a whole series of cohomology theories, 
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such as elliptic cohomology and Morava K-theories, 
have been constructed as “simplifications” of complex 
cobordism: there is a constant tension in topology 
between invariants that carry a lot of information and 
invariants that are easy to compute. In one direction, 
complex cobordism and its variants provide the most 
powerful tool for the computation and understand- 
ing of the homotopy groups of spheres. Beyond the 
range where Bernoulli numbers appear, we see deeper 
number theory such as modular forms [III.61]. In 
another direction, the geometric definition of complex 
cobordism makes it useful in algebraic geometry. 

7 Conclusion 

The line of thought introduced by pioneering topolo- 
gists like riemann [VI.49] is simple but powerful. Try to 
translate any problem, even a purely algebraic one, into 
geometric terms. Then ignore the details of the geom- 
etry and study the underlying shape or topology of the 
problem. Finally, go back to the original problem and 
see how much has been gained. The fundamental topo- 
logical ideas such as cohomology are used throughout 
mathematics, from number theory to string theory. 

Further Reading 

From the definition of topological spaces to the fun- 
damental group and a little beyond, I like M. A. Arm- 
strong’s Basic Topology (Springer, New York, 1983). 
The current standard graduate textbook is A. Hatcher’s 
Algebraic Topology (Cambridge University Press, Cam- 
bridge, 2002). Two of the great topologists, Bott and 
Milnor, are also brilliant writers. Every young topolo- 
gist should read R. Bott and L. Tu’s Differential Forms 
in Algebraic Topology (Springer, New York, 1982), J. Mil- 
nor’ s Morse Theory (Princeton University Press, Prince- 
ton, NJ, 1963), and J. Milnor and J. Stasheff’s Character- 
istic Classes (Princeton University Press, Princeton, NJ, 
1974). 


IV. 7 Differential Topology 

C. H. Taubes 


1 Smooth Manifolds 

This article is about classifying certain objects called 
smooth manifolds, so I need to start by telling you what 
they are. A good example to keep in mind is the sur- 
face of a smooth hall. If you look at a small portion of 
it from very close up, then it looks like a portion of a 


flat plane, but of course it differs in a radical way from 
a flat plane on larger distance scales. This is a general 
phenomenon: a smooth manifold can be very convo- 
luted, but must be quite regular in close-up. This “local 
regularity” is the condition that each point in a mani- 
fold belongs to a neighborhood that looks like a portion 
of standard Euclidean space in some dimension. If the 
dimension in question is d for every point of the mani- 
fold, then the manifold itself is said to have dimension 
d. A schematic of this is shown in figure 1. 

What does it mean to say that a neighborhood “looks 
like a portion of standard Euclidean space”? It means 
that there is a “nice” one-to-one map 4> from the neigh- 
borhood into R d (with its usual notion of distance (see 
metric spaces [III. 5 8])). One can think of 4> as “iden- 
tifying” points in the neighborhood with points in U d : 
that is, x is identified with </>(x). If we do this, then the 
function <p is called a coordinate chart of the neighbor- 
hood, and any chosen basis for the linear functions on 
the Euclidean space is called a coordinate system. The 
reason for this is that 4> allows us to use the coordinates 
in R d to label points in the neighborhood: if x belongs 
to the neighborhood, then one can label it with the 
coordinates of </>(x). For example, Europe is part of the 
surface of a sphere. A typical map of Europe identifies 
each point in Europe with a point in flat, two-dimen- 
sional Euclidean space, that is, a square grid labeled 
with latitude and longitude. These two numbers give 
us a coordinate system for the map, which can also be 
transferred to a coordinate system for Europe itself. 

Now, here is a straightforward but central observa- 
tion. Suppose that M and N are two neighborhoods that 
intersect, and suppose that functions <p : M — R d and 
ip : N — R d are used to give them each a coordinate 
chart. Then the intersection M n IV is given two coordi- 
nate charts, and this gives us an Identification between 
the open regions <J>(M n N) and ip(M rN) of R d : given 
a point x in the first region, the corresponding point 
in the second is (J/(<J> _1 (x)). This composition of maps 
is called a transition function, and it tells you how the 
coordinates from one of the charts on the intersecting 
region relate to those of the other. The transition func- 
tion is a homeomorphism [III.92] between the regions 
cf>(M n N) and ip(MnN). 

Suppose that you take a rectangular grid in the first 
Euclidean region and use the transition function t 
to map it to the second one. It is possible that the image 
will again be a rectangular grid, but in general it will be 
somewhat distorted. An illustration is given in figure 2. 
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Figure 1 Small portions of a manifold 
resemble regions in a Euclidean space. 



Figure 2 A transition function from a rectangular 
grid to a distorted rectangular grld. 


The proper term for a space whose points are sur- 
rounded by regions that can be identified with parts 
of Euclidean space is a topological manifold. The word 
“topological” is used in order to indicate that there 
are no constraints on the coordinate-chart transition 
functions apart from the hasic one that they should 
be continuous. However, some continuous functions 
are quite unpleasant, so one typically introduces extra 
constraints in order to limit the distorting effect that 
the transition functions can have on a rectangular 
coordinate grid. 

Of prime interest here is the case where the transition 
functions are required to be differentiable to all orders. 
If a manifold has a collection of charts for which ah the 
transition functions are infinitely differentiable, then 
it is said to have a smooth structure, and it is called 
a smooth manifold. Smooth manifolds are especially 
interesting because they are the natural arena for cal- 
culus. Roughly speaking, they are the most general con- 
text in which the notion of differentiation to any order 
makes intrinsic sense. 


A function /, defined on a manifold, is said to be dif- 
ferentiable if , given any of its coordinate charts <£ : N — • 
M d , the function g(y ) = (y)) (which is defined 

on a region of R d ) is differentiable [1.3 § 5.3]. Calculus 
is impossible on a manifold if it does not admit charts 
with differentiable transition functions, because a func- 
tion that might appear differentiable in one chart will 
not, in general, be differentiable when viewed from a 
neighboring chart. 

Here is a one-dimensional example to ihustrate this 
point. Consider the fohowing two coordinate charts of 
a neighborhood of the origin in the real line. The first is 
the obvious chart that simply represents a real number 
x by itself. (Formally speaking, one is taking the func- 
tion cf> to be defined by the simple formula cf(x) = x.) 
The second represents x by the point x 1/3 . (Here the 
cube root of a negative number x is defined to be minus 
the cube root of -x.) What is the transition function 
between these two charts? Well, if t is a point in the 
region of E used for the first chart, then ø -1 (t) = t, so 
i//(</> -1 (f)) = ip(t) = t 1/3 . This is a continuous function 
of t but it is not differentiable at the origin. 

Now consider the simplest possible function defined 
on the region used for the second chart, the function 
h(s) = s, and let us work out the corresponding func- 
tion / on the manifold itself. The value of / at x should 
be the value of h at the point s corresponding to x. 
This point is i//(x) = x 1/3 , so /(x) = h(x 1/3 ) = x 1/3 . 
Finally, since the point x in the manifold corresponds 
to the point t = ø(x) = x in the first region, the cor- 
responding function on the first region is g(t) = t 1/3 . 
(This is the same function as / only because (f> happens 
to be the very special map that takes each number to 
itself.) Thus, the eminently differentiable function h on 
one coordinate chart translates into the continuous but 
not differentiable function g on the other. 

Suppose one is given a topological manifold M with 
two sets of charts, both of which have infinitely differ- 
entiable transition functions. Then each set of charts 
gives us a smooth structure on the manifold. Of great 
importance is the faet that these two smooth structures 
can be fundamentally different. 

To see what this means, let us call the sets of charts 
K and L. Given a function /, let us call it K-differen- 
tiable if it is differentiable from the viewpoint of K, 
and L- differentiable if it is differentiable from the view- 
point of L. It may easily happen that a function is 
^-differentiable without being I-differentiable or vice 
versa. However, we can say that K and L give the 
same smooth structure on M when there is a map, F, 
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from M to itself with the following three properties. 
First, F is invertible and both F and F _1 are contin- 
uous. Second, the composition of F with any func- 
tion that is F-differentiable is L-differentiable. Third, 
the composition of the inverse of F _1 with any func- 
tion that is L-differentiable is F-differentiable. Loosely 
speaking, F turns the F-differentiable functions into 
L-differentiable ones and F _1 turns them back again. 
If no such function F exists, then the smooth struc- 
tures given by K and L are considered to be genuinely 
different. 

To see how this plays out, let us look at the one- 
dimensional example again. As noted previously, the 
functions that you deem to be differentiable if you use 
the (/»-chart are not the same as those you deem to 
be differentiable if you use the (//-chart. For example, 
the function x — x 1/3 is not (/»-differentiable but it is 
(//-differentiable. Even so, the (/»-differentiable and (//- 
differentiable sets of functions define the same smooth 
structure for the line, since any (//-differentiable func- 
tion becomes (/»-differentiable once you compose it 
with the self-map F : t >- t 3 . 

It is very far from obvious that any manifold can 
have more than one smooth structure, but this turns 
out to be the case. There are also manifolds that are 
entirely lacking in smooth structures. These two facts 
lead directly to the central concern of this essay, the 
long-sought quest for the two holy grails of differential 
topology. 

• A list of all smooth structures on any given topo- 
logical manifold. 

• An algorithm to identify any given smooth struc- 
ture on any given topological manifold with the 
corresponding structure from the list. 

2 What Is Known about Manifolds? 

Much has been accomplished as of the writing of this 
article with respect to the two points Usted above. This 
said, the task for this part of the article is to summarize 
the State of affairs at the beginning of the twenty-first 
century. Various examples of manifolds are described 
along the way. 

The story here requires a brief, preliminary digres- 
sion to set the stage. If you have two manifolds and 
you set them side by side without their touching, then 
technically speaking they can be regarded as a single 
manifold that happens to have two components. In 
such a case, one can study the components individually. 
Therefore, in this article I shall talk exclusively about 


connected manifolds: that is, manifolds with just one 
component. In a connected manifold, one can get from 
any point to any other point without ever leaving the 
manifold. 

A second technical point is that it is useful to distin- 
guish between manifolds such as the sphere, which are 
bounded in extern, and manifolds such as the plane, 
which go off to infmity. More precisely, I am talking 
about the distinction between compact [III.9] and non- 
compact manifolds: a compact manifold canbe thought 
of as one that can be expressed as a closed bounded 
subset of R n for some n. The discussion that fol- 
lows wiU be almost entirely about compact manifolds. 
As some of the examples below wUl demonstrate, the 
story for compact manifolds is less convoluted than 
the analogous story for noncompact ones. For sim- 
pUcity I shall often use the word “manifold” to mean 
“compact manifold”; it will be clear from the context if 
noncompact manifolds are also being discussed. 

2.1 Dimension 0 

There is only one dimension-0 manifold. It is a single 
point. The period at the end of this sentence looks, 
from afar, like a connected, dimension-0 manifold. Note 
that the distinction between topological and smooth is 
irrelevant here. 

2.2 Dimension 1 

There is only one compact, connected, one-dimensional 
topological manifold, namely the circle. Moreover, the 
circle has just one smooth structure. Here is one way to 
represent this structure. Take as a representative cir- 
cle the unit circle in the xy-plane, that is, the set of 
all points ( x,y ) with x 2 + y 2 = 1. This can be cov- 
ered by two overlapping intervals, each of which cov- 
ers slightly more than half of the circle. The intervals 
Ui and U2 are drawn in figure 3. Each interval consti- 
tutes a coordinate chart. The one on the left, Ui, can 
be parametrized in a continuous fashion by taking the 
angle of a given point as measured counterclockwise 
from the positive x-axis. For example, the point (1,0) 
has angle 0, and the point (-1,0) has angle tt. In order 
to parametrize U2 by angle, you will have to start with 
angle tt at the negative x-axis. If you move around t/2, 
varying this angle continuously, then when you reach 
the point (1,0) you will have parametrized it as a point 
in U2 using the angle 2 tt. 

As you can see, the ares U\ and U2 intersect in two 
separated, smaUer ares; these are labeled V) and V2 in 
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Figure 3 Two charts that cover the circle. 



Figure 4 The intersection of the ares U\ and F/2 ■ 



Figure 5 A knotted loop in 3-space. 

figure 4. The transition funetion on Vi is the identity 
map, since the Ui angle of any given point in Vi is the 
same as its U2 angle. By contrast, the U2 angle of a point 
in V2 is obtained from the Ui angle by adding 2 tt. Thus, 
the transition funetion on V2 is not the identity map but 
the map that adds 2tt to the coordinate funetion. 

This one-dimensional example brings up a neunber of 
important issues, all related to a particularly troubling 
question. To State it, consider first that there are lots of 
closed loops in the plane that canbe taken as model cir- 
cles. Indeed, the word “lots” considerably understates 
the situation. Moreover, why should we restrict our 
attention to circles in the plane? There are closed loops 
galore in 3-space too: see figure 5, for example. For 
that matter, any manifold of dimension greater than 1 


has smooth loops. Earlier, it was asserted that there is 
just one smooth, compact, connected, one-dimensional 
manifold, so all of these loops must be considered the 
“same.” Why is this? 

Here is the answer. We often think of a manifold as 
it might appear were it sitting in some larger space. 
For example, we might imagine a circle sitting in the 
plane, or sitting knotted in three-dimensional Euclid- 
ean space. However, the notion of “smooth manifold” 
introduced above is an intrinsic one, in the sense that it 
does not depend on how the manifold is placed inside 
a higher-dimensional space. Indeed, it is not even nec- 
essary for there to be a higher-dimensional space at all. 
In the case of the circle, this can be said in the following 
way. The circle canbe placed as a loop in the plane, or as 
a knot in 3-space, or whatever. Each view of the circle in 
a higher-dimensional Euclidean space defines a collec- 
tion of funetions that are considered differentiable: one 
just takes the differentiable funetions of the coordin- 
ates of the big Euclidean space and restricts them to the 
circle. As it turns out, any one such collection defines 
the same smooth structure on the circle as any other. 
Thus, the smooth structures that are provided by these 
different views of a circle are all the same, even though 
there are many interesting ways of placing a circle in a 
given higher-dimensional space. (In faet, the classifica- 
tion of knots in 3-space is a fascinating, vibrant topic 
in its own right: see knot polynomials [III.46].) 

Flow is it proved that there is only one smooth struc- 
ture for the circle? For that matter, how is it proved 
that there is but a single compact topological manifold 
in dimension 1? Since this article is not meant to pro- 
vide proofs, these questions are left as serious exer- 
cises with the following advice. Think hard about the 
definitions and, for the smooth-manifold question, use 
some calculus. 

2.3 Dimension 2 

The story for two-dimensional, connected, compact 
manifolds is mueh richer than that for dimension 1. 
In the first place, there is a basic dichotomy between 
two kinds of manifold: orientable and nonorientable. 
Roughly speaking, this is the distinetion between man- 
ifolds that have two sides and those that have just 
one. To give a more formal definition, a two-dimen- 
sional manifold is called orientable if every loop in 
the manifold that does not cross itself or have any 
kinks has two distinet sides. This is to say that there 
is no path from one side of the loop to the other 
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Figure 8 Cutting and glulng. 


that avoids the loop yet remains very close to it. The 
Mobius strip (see figure 6) is not orientable because 
there are paths from one side of the central loop to 
the other that do not cross the central loop yet remain 
very close to it. The orientable, compact, connected, 
topological, two-dimensional manifolds are in one-to- 
one correspondence with a collection of fundamental 
foods: the apple, the doughnut, the two-holed pretzel, 
the three-holed pretzel, the four-holed pretzel, and so 
on (see figure 7). Technically, they are classified by an 
integer, called the genus. This is 0 for the sphere, 1 for 
the torus, 2 for the two-holed torus, etc. The genus 
counts the number of holes that appear in a given exam- 
ple from figure 7. To say that this classifies them is to 
say that two such manifolds are the same if and only 
if they have the same genus. This is a theorem due to 
poincaré [VI.61]. 

As it turns out, every topological two-dimensional 
manifold has exactly one smooth structure, so the list 
in figure 7 is the same as the list of the smooth ori- 
entable two-dimensional manifolds. Here one should 
keep in mind that the notion of a smooth manifold is 
intrinsic, and therefore independent of how the man- 
ifold is represented as a surface in 3-space, or in any 
other space. For example, the surfaces of an orange, 
a banana, and a watermelon each represent embed- 
ded images of the two-dimensional sphere, the leftmost 
example in figure 7. 

The shapes illustrated in figure 7 suggest an idea that 
plays a key role when it comes to classifying manifolds 
of higher dimensions. Notice that the two-holed torus 
canhe viewed as the result of taking two one-holed tori, 


cutting disks out of both, glulng the results together 
across their boundary circles, and then smoothing the 
comers . This operation is depicted in figure 8. This sort 
of cutting and gluing operation is an example of what 
is called a surgery. The analogous surgery can also be 
done with a one-holed torus and a two-holed torus to 
obtain a three-holed torus. And so on. Thus, all of the 
oriented two-dimensional manifolds can be built using 
standard surgeries on copies of just two fundamental 
building blocks: the one-holed torus and the sphere. 
Here is a nice exercise to test your understanding of 
this process. Suppose that you perform a surgery, as 
in figure 8, on a sphere and another manifold M. Prove 
that the resulting manifold is the same, with regard to 
its topological and smooth structure, as M. 

As it turns out, all of the nonorientable two-dimen- 
sional manifolds can be built using a version of surgery 
that first cuts a disk out of an orientable two-dimen- 
sional manifold and then glues on a Mobius strip. 
To be more precise, note that the Mobius strip has 
a circle as its boundary. Cut a disk out of any given 
orientable, two-dimensional manifold and the result 
also has a circular boundary. Glue the latter circular 
boundary to the Mobius strip boundary, smooth the 
corners, and the result is a smooth manifold that is 
nonorientable. Every nonorientable, topological (and 
thus every nonorientable, smooth), two-dimensional 
manifold is obtained in this way. Moreover, the man- 
ifold you get depends only on the number of holes (the 
genus) of the orientable manifold that is used. 

The manifold obtained from the surgery of a Mobius 
strip with a sphere is called the projective plane. The 
one that uses the Mobius strip and the torus is called 
the Klein bottle. These shapes are illustrated in figure 9. 
No nonorientable example canbe put into three-dimen- 
sional Euclidean space in a clean way; any such place- 
ment is forced to have portions that pass through other 
portions, as can be seen in the illustration of the Klein 
bottle. 

How does one prove that the list given above ex- 
hausts all two-dimensional manifolds? One method 
uses versions of the geometric techniques that are 
discussed below in the three-dimensional context. 

2.4 Dimension 3 

There is now a complete classification of all smooth, 
three-dimensional manifolds; this is a very recent 
achievement. There has been for some time a conjec- 
tured list of all three-dimensional manifolds, and a con- 
jectured procedure for telling one from the other. The 


TV. 7. Differential Topology 


87 


O 

Sphere 



One-holed torus Two-holed torus 

Figure 7 The orientable manifolds of dimension 2. 



P roj ecti veplane Klein bottl e 


Figure 9 Two nonorlentable surfaces. To form tbe projec- 
tive plane, one identifles the boundary of the Moblus strip 
with the boundary of the hemisphere. 


proof of these conjectures was recently completed by 
Grisha Perelman; this is a much-celebrated event in 
the mathematics community. The proof uses geometry 
about which more is said in the final part of this article. 
Here I shall concentrate on the classification scheme. 

Before getting to the classification scheme, it is nec- 
essary to introduce the notion of a geometric structure 
on a manifold. Roughly speaking, this means a rule for 
defming the lengths of paths on the manifold. This 
rule must satisfy the following conditions. The con- 
stant path that simply stays at one point has length 0, 
but any path that moves at all has positive length. Sec- 
ond, if one path starts where another ends, the length 
of their concatenation (that is, the result of putting the 
two paths together) is the sum of their lengths. 

Note that a rule of this sort for path lengths leads nat- 
urally to a notion of distance d{x,y) between any two 
points x and y on the manifold: one takes the length of 
the shortest path between them. It turns out to be par- 
ticularly interesting when d(x,y) 2 varies as a smooth 
function of x and y. 

As it happens, there is nothing special about having 
a geometric structure. Manifolds have them in spades. 
The following are three very useful geometric struc- 
tures for the interior of the ball of radius 2 about the 
origin in n-dimensional Euclidean space. In these for- 


mulas, the given path is viewed as if drawn in real time 
by some hyper-dimensional artist, with x(t) denoting 
the position of the pencil tip on the path at time t. Here, 
t ranges over some interval of the real line: 


length = J |x(t)| dt; 

length = J |xa)l i + i| 1 ^( t ) |2 dt; 

length = f|x(t)| — — dt. 

J l-||x(t)| 2 J 


(1) 


In these formulas, x denotes the time-derivative of the 
path t -*■ x(t). 

The first of these geometric structures leads to the 
standard Euclidean distance between pairs of points. 
For this reason it is called the Euclidean geometry for 
the ball. The second defines what is called spherical 
geometry because the distance between any two points 
is the angle between certain corresponding points in 
the sphere of radius 1 in (n + 1) -dimensional Euclid- 
ean space. The correspondence comes from an (n + 1)- 
dimensional version of the stereographic projection 
that is used for maps of the Earth’s polar regions. The 
third distance function defines what is called the hyper- 
bolic geometry on the ball. This arises when the ball 
of radius 2 in n-dimensional Euclidean space is iden- 
tified in a certain way with a particular hyperbola in 
(n + 1) -dimensional Euclidean space. 

The geometric structures that are depicted in (1) turn 
out to be symmetrical with respect to rotations and 
certain other transformations of the unit ball. (You 
can read more about Euclidean, spherical, and hyper- 
bolic geometry hi some fundamental mathematical 
definitions [1.3 §§6.2, 6.5, 6.6].) 

As was remarked above, there are very many geo- 
metric structures on any given manifold and so one 
might hope to find one that has some particularly desir- 
able properties. With this goal in mind, suppose that I 
have specified some “standard” geometric structure S 
for the ball in R" to serve as a model of an exception- 
ally desirable structure. This could be one of the ones 
I have just defined or some other favorite. This leads 
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to a corresponding notion of the structure S for a com- 
pact manifold. Roughly speaking, one says that a geo- 
metric structure on a manifold is of the type S if every 
point in the manifold feels as though it belongs to the 
unit hall with the structure S, that is, if one can use the 
structure 5 on the hall to provide coordinate charts that 
respect the geometric structure on the manifold. To be 
more precise, suppose that I am defming a coordinate 
system in a small neighborhood N of x by means of a 
function ø : N — M. d . If I can always do this in such a 
way that the image <p(N ) Ues inside the baU, and such 
that the distance between any two points x and y in 
N equals the distance between their images <p(x) and 
defined in terms of the structure S on the hall, 
then I will say that the manifold has structure of type S. 
In particular, a geometric structure is said to be Euclid- 
ean, spherical, or hyperbolic when the structure on the 
hall is Euclidean, spherical, or hyperbolic, respectively. 

For example, the sphere in any dimension has a 
spherical geometric structure (as it should!). As it turns 
out, every two-dimensional manifold has a geometric 
structure that is either spherical, EucUdean, or hyper- 
bolic. Moreover, if it has a structure of one of these 
types, then it cannot have one of a different type. In par- 
ticular, the sphere has a spherical structure, but not a 
Euclidean or hyperboUc structure. Meanwhile, the torus 
in dimension 2 has a Euclidean geometric structure but 
only a Euclidean one, and all of the other manifolds 
listed in figure 7 have hyperboUc geometric structures 
and only hyperbolic ones. 

WUUam Thurston had the great insight to realize 
that three-dimensional manifolds might be classifiable 
using geometric structures. In particular, he made what 
was known as the geometrization conjecture, which 
says, roughly speaking, that every three-dimensional 
manifold is made up of “nice” pieces: 

Every smooth three-dimensional manifold can be cut in 
a canonical fashion along a predetermined set of two- 
dimensional spheres and one-holed tori so that each of 
the resulting parts has precisely one of a Ust of eight 
possible geometric structures. 

The eight possible structures include the spherical, 
Euclidean, and hyperbolic ones. These plus the other 
five are, in a sense that can be made precise, those 
that are maximally symmetric. The other five are associ- 
ated withvarious lie groups [III.50 §1], as are the listed 
three. 

Since its proof by Perelman, the geometrization con- 
jecture has come to be known as the geometrization 


theorem. As I shall explain in a moment, this provides 
a satisfactory resolution of the three-dimensional part 
of the quest set out at the end of section 1. This is 
because a manifold with one of the eight geometric 
structures can be described in a canonical fashion using 
group theory. As a result, the geometrization theorem 
turns the classification issue for manifolds into a ques- 
tion that group theory can answer. What follows is an 
indication of how this comes about. 

Each of the eight geometric structures has an associ- 
ated model space which has the given geometric struc- 
ture. For example, in the case of the spherical struc- 
ture, the model space is the three-dimensional sphere. 
For the Euclidean structure, the model space is the 
three-dimensional EucUdean space. For the hyperbolic 
structure, it is the hyperbola in the four-dimensional 
Euclidean space, where the coordinates (x, y, z, t) obey 
t 2 = 1 + x 2 + y 2 + z 2 . In all of the eight cases, the model 
space has a canonical group of self-maps that preserve 
the distance between any two pairs of points. In the 
Euclidean case, this group is the group of translations 
and rotations of the three-dimensional EucUdean space. 
In the spherical case, it is the group of rotations of the 
four-dimensional EucUdean space, and in the hyper- 
boUc case, it is the group of Lorentz transformations 
of four-dimensional Minkowski space. The associated 
group of self-maps is called the isometry group for the 
given geometric structure. 

The connection between manifolds and group theory 
arises because a certain set of discrete subgroups of the 
isometry group of any one of the eight model spaces 
determines a compact manifold with the correspond- 
ing geometric structure. (A subgroup is called discrete 
if every point in the subgroup is isolated, meaning that 
it belongs to a neighborhood that contains no other 
points from the subgroup.) This compact manifold is 
obtained as follows. Two points x and y in the model 
space are declared to be equivalent if there is an isom- 
etry T, belonging to the subgroup, such that Tx = y. 
In other words, x is equivalent to all its images under 
isometries from the subgroup. It is easy to check that 
this notion of equivalence is a genuine equivalence 
relation [1.2 §2.3]. The equivalence classes are then 
in one-to-one correspondence with the points of the 
associated compact manifold. 

Here is a one-dimensional example of how this works. 
Think of the real line as a model space whose isometry 
group is the group of translations. The set of transla- 
tions by integer multiples of 2tt forms a discrete sub- 
group of this group. Given a point t in the real line, the 
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Figure 10 A link formed out of two knots. 

possible images under translations from the subgroup 
are all the numbers of the form t + 2nn, where n is an 
integer, so one regards two real numbers as equivalent 
if they differ by a multiple of 2tt, and the equivalence 
class of t is [t + 2nn :«£Z}. One can associate with 
this equivalence class the point (x, y ) = (cos t, sin t) in 
the circle, since adding a multiple of 2 tt to t does not 
affect either its sine or its cosine. (Intuitively speaking, 
if you regard each t as equivalent to t + 2 tt, then you 
are wrapping the real line around and around a circle.) 

This association between certain subgroups of the 
isometry group and compact manifolds with the given 
geometric structure goes in the other direction as well. 
That is, the subgroup can be recovered from the man- 
ifold in a relatively straightforward fashion using the 
faet that each point in the manifold lies in a coordinate 
chart where its distance funetion is the same as that of 
the associated model space. 

Evenbefore Perelman’s work there was a tremendous 
amount of evidence for the validity of the geometriza- 
tion conjecture, mueh of it supplied by Thurston. In 
order to discuss this evidence, a small digression is 
required to give some of the background. First, I need 
to bring in the notion of a link in the three-dimensional 
sphere. A link is the name given to a finite disjoint 
union of knots. Figure 10 depicts an example of one 
that is made out of two knots. 

I also need the notion of surgery on a link. To this end, 
thicken the link so as to view it as a union of knotted, 
solid tubes. (Think of the knot as the copper in an insu- 
lated wire and view the solid tube as the copper plus 
the surrounding insulation.) Notice that the boundary 
of any given component tube is really a copy of our 
one-holed torus from figure 7. Therefore, removing any 
one of the tubes leaves a tubular-shaped missing region 
from the three-dimensional sphere whose boundary is 
a torus. 


Now, to define a surgery, imagine removing a knotted 
tube and then gluing it back in a different way. That is, 
imagine gluing the boundary of the tube to the bound- 
ary of the resulting missing region using an Identifica- 
tion that is not the same as the original. For example, 
take the “unknot,” a standard round circle in a given 
plane, here viewed as living inside a coordinate chart 
of the three-dimensional sphere. Take out the solid 
tube around it, and then replace the tube by gluing the 
boundary in the “wrong” way, as follows. Consider the 
leftmost torus in figure 1 1 as the boundary of the com- 
plement of the tube in M 3 . Consider the middle torus 
as the inside of the tube. The “wrong” gluing identifies 
the circles marked “R” and “L” on the leftmost torus 
with their counterparts on the middle torus. The result- 
ing space is a three-dimensional manifold which turns 
out to be the product of the circle with the two-dimen- 
sional sphere. That is to say, it is the set of ordered 
pair s (x,y), where x is a point in the circle and y is 
a point in the two-dimensional sphere. There are many 
other possible ways to glue the boundary torus, and 
almost all of the corresponding surgeries give rise to 
distinet three-dimensional manifolds. One of these is 
illustrated in the rightmost part of figure 11. 

In general, given any link one can construct a count- 
ably infinite set of distinet, smooth three-dimensional 
manifolds by using surgeries on it. Furthermore, Ray- 
mond Lickorish proved that every three-dimensional 
manifold can be obtained by using surgery on some 
link in the three-dimensional sphere. Unfortunately, 
this characterization of three-dimensional manifolds 
via surgeries on links does not provide a satisfactory 
resolution to the central quest of classifying smooth 
structures because the process is far from unique: for 
any given manifold there is a bewildering assortment 
of links and surgeries that can be used to produce it. 
Moreover, as of this writing, there is no known way 
to classify knots and links in the three-dimensional 
sphere. 

In any event, here is a taste of Thurston’s evidence 
for his geometrization conjecture. Given any link, all 
but finitely many of the three-dimensional manifolds 
you can produce from it by surgery satisfy the conclu- 
sions of the geometrization conjecture. Thurston also 
proved that, given any knot apart from the unknot, all 
but finitely many surgeries on it produce a manifold 
with a hyperbolic geometric structure. 

By the way, Perelman’s proof of the geometrization 
theorem gives as a special case a proof of the Poincaré 
conjecture, proposed by Poincaré in 1904. To State this 
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we need the notion of a simply connected manifold. This 
is a manifold with the property that any closed loop in 
it can be shrunk down to a point. To be more precise, 
designate a point in the manifold as the “base point.” 
Then any path in the manifold that starts and ends at 
the chosen base point can be continuously deformed 
in such a way that at each stage of the deformation 
the path still starts and ends at the base point, and so 
that the end result is the trivial path that starts at the 
base point and just stays there. For example, the two- 
dimensional sphere is simply connected, but the torus 
is not, since a loop that goes “once around” the torus 
(for example, any of the loops R or L in the various 
tori of figure 11) cannot he shrunk to a point. In faet, 
a sphere is the only two-dimensional manifold that is 
simply connected, and spheres are simply connected in 
all dimensions greater than 1. 

The Poincaré conjecture. Every compact, simply con- 
nected, three-dimensional manifold is the three-dimen- 
sional sphere. 

2.5 Dimension 4 

This is the weird dimension. Nobody has managed to 
formulate a useful and viable conjecture for the clas- 
sification of smooth, compact, four-dimensional man- 
ifolds. On the other hånd, the classification story for 
many categories of topological four-dimensional man- 
ifolds is well-understood. For the most part, this work 
is by Michael Freedman. 

Some of the topological manifolds in dimension 4 do 
not admit smooth structures. The so-called “ || con- 
jecture” proposes necessary and sufficient conditions 
for a four-dimensional, topological manifold to have 
at least one smooth structure. The fraction ' 8 ' here 
refers to the absolute value of the ratio of the rank 
to the signature of a certain symmetric, bilinear form 
that appears in the four-dimensional story. The case g 
excepted, the conjecture asserts that a smooth struc- 
ture exists if and only if this ratio is at least g 1 . The 
bilinear form in question is obtained by counting with 


signed weights the intersection points between vari- 
ous two-dimensional surfaces inside the given four- 
dimensional manifold. In this regard, note that a typi- 
cal pair of two-dimensional surfaces in four dimensions 
will intersect at finitely many points. This is a higher- 
dimensional analogue of a faet that is rather easier to 
visualize: that a typical pair of loops in the two-dimen- 
sional plane will intersect at finitely many points. Not 
surprisingly, the bilinear form here is called the inter- 
section fornv, it plays a prominent role in Freedman’s 
classification theorems. 

Meanwhile, the problem of listing all smooth struc- 
tures is wide open in four dimensions: there are no 
cases of a topological manifold with at least one 
smooth structure where the list of distinet struc- 
tures is known to be complete. Some topological four- 
dimensional manifolds are known to have (countably) 
infinitely many distinet smooth structures. For oth- 
ers there is only one known structure. For example, 
the four-dimensional sphere has one obvious smooth 
structure and this is the only one known. However, 
the underlying topological manifold may, for all any- 
one knows, have many distinet smooth structures. By 
the way, the story for noncompact manifolds in dimen- 
sion 4 is truly bizarre. For example, it is known that 
there are uncountably many smooth manifolds that 
are homeomorphic to the standard, four-dimensional 
Euclidean space. But even here, our understanding is 
less than optimal since there is no known explicit 
construction of a single one of these “exotic” smooth 
structures. 

Simon Donaldson provided a set of geometric invari- 
ants that have the power to distinguish smooth struc- 
tures on a given topological 4-manifold. Donaldson’s 
invariants were recently superseded by a suite of more 
computable invariants; these were proposed by Edward 
Witten and are called the Seiberg-Witten invariants. 
More recently still, Peter Oszvath and Zoltan Szabo 
designed a possibly equivalent set of invariants that 
are even easier to use. Do the Seiberg-Witten invariants 
(broadly defined) distinguish all smooth structures? No 
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one knows. A bit more is said about these invariants in 
the final part of this article. 

Note that Freedman’s results include the topologi- 
cal version of the four-dimensional Poincaré conjecture 
that follows. 

The four-dimensional sphereis the onlycompact, topo- 
logical 4-manifold with the following property: every 
based map from either a one-dimensional circle or a 
two-dimensional sphere can be continuously deformed 
so that the result maps onto the base point. 

The smooth version of this conjecture has not been 
resolved. 

Is there a four-dimensional version of the geometri- 
zation conjecture/theorem? 

2.6 Dimensions 5 and Greater 

Surprisingly enough, the issues raised at the end of 
the first section have more or less been resolved in 
all dimensions that are greater than 4. This was done 
some time ago by Stephen Smale with input from John 
Stallings. In these higher dimensions it is also possi- 
ble to say what conditions need to hold in order for a 
topological manifold to admit a smooth structure. For 
example, John Milnor and others determined that the 
respective number of smooth structures on the spheres 
of dimensions 5-18 are as follows: 1, 1, 28, 2, 8, 6, 992, 
1, 3, 2, 16 256, 2, 16, 16. 

At first sight, it is surprising that the dimensions 
greater than 4 are easier to deal with than dimen- 
sions 3 and 4. However, there is a good reason for this. 
It turns out that there is more room to maneuver in 
these higher-dimensional spaces and this extra room 
makes all the difference. To get a sense for this, let n 
be a positive integer, and let S n denote the n-dimen- 
sional sphere. To make this more concrete, view S n as 
the set of points (xi, . . . ,x n +i) in the Euclidean space 
R n such that x\ + ■ ■ ■ + x£ +1 = 1. Now consider the 
product manifold, S n x S n . This is the set of pairs of 
points (x,y), where x is in one copy of S n and y is 
in another. This product manifold has dimension 2n. 
A standard picture of S n x S n has two distinguished 
copies of S n inside it, one consisting of all points of 
the form ( x,y ) with y = (1,0,...) and the other con- 
sisting of all points (x,y) with x = (1,0, ... ). Let us 
call the first copy Sr and the second one Sl. Of partic- 
ular interest here is the faet that Sr and Sl intersect in 
precisely one point, the point ((1,0, . . . ), (1,0,...)),. 

By the way, in the n = 1 case, the space S 1 x S 1 is 
the doughnut in figure 7. The one-dimensional spheres 


Sr and Sl inside it are the circles that are drawn in the 
leftmost diagram in figure 11. 

If you are with me so far, suppose now that an 
advanced alien en route from Arcturus to the galactic 
center kidnaps you and drops you into some unknown, 
2n-dimensional manifold. You suspect that it is S n xS”, 
but are not sure. One reason that you suspect this to 
be the case is that you have found a pair of n-dimen- 
sional spheres in it, one you call Mr and the other you 
call Ml. Unfortunately, they intersect in 2N + 1 points, 
where N > 0. You would be less nervous about things if 
you could find a pair of different spheres that intersect 
precisely once. So you wonder whether perhaps you can 
push Ml around a bit so as to remove the 2 N unwanted 
intersection points. 

The surprise here is that the issue of removing inter- 
section points in any dimension concerns only certain 
zero-, one-, and two-dimensional manifolds that Uve 
inside your 2n-dimensional one. This is an old observa- 
tion due to Hassier Whitney. In particular, Whitney dis- 
covered that in the 2n-dimensional manifold you must 
be able to find a disk of dimension two whose bound- 
ary loop lies half in Ml and half in Mr. This boundary 
loop must hit two of the intersection points (one when 
it passes from Ml to Mr and one when it passes back 
again). The disk must also stick out orthogonally to Ml 
and Mr where it touches them. If its interior is disjoint 
from both Ml and Mr, and if there are no points where 
the disk comes back to intersect itself, then you can 
push the part of Ml that is very near the disk along the 
disk while stretching the remaining part to keep things 
from tearing. If you extend the disk a bit past Mr, then 
you will have removed two of the intersection points 
when you have pushed past the end of the disk. Fig- 
ure 12 is a schematic of this. This pushing operation 
(the Whitney trick) can be performed in any manifold 
of any dimension if you can find the required disk. The 
problem is to find the disk. Figure 13 is a drawing of a 
cross-sectional slice showing a “good” disk on the left 
and some badly chosen disks in the middle and on the 
right. If you have a badly chosen disk that neverthe- 
less satisfies the required boundary conditions, then 
you might hope to find a tiny wiggle of its interior that 
makes it better. You would like the new disk to have no 
self -intersection points and you would like its interior 
to be disjoint from both Ml and Mr. No wiggle along a 
direction that is parallel to the disk itself will help, for 
any such wiggle only changes the position of the inter- 
section point in the disk. Likewise, a wiggle in a direc- 
tion parallel to the offending Ml or Mr is useless since 
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it only changes the position of the intersection point 
in the latter space. Thus, 2 + n of the 2n dimensions 
are useless when it comes to wiggling a disk. However, 
there are 2n-(n + 2) = n — 2 remaining dimensions to 
work with, which is a positive number when 2n > 4. In 
faet, when this is true a generic wiggle in any of these 
extra dimensions does the trick. 

Now, when 2n = 4 (so n = 2) there are no extra 
dimensions, and, consequently, no small wiggle can 
make a new disk without intersection points. So if a 
given candidate disk intersects Mr, then the Whitney 
trick just trades the old pair of intersection points for 
a new collection. If the disk intersects either itself or 
Ml, then the new version of Ml has self-intersection 
points: that is, points where one part has come around 
to intersect another. 

This fail ure of the Whitney trick is the bane of four- 
dimensional topology. Thus, a major lemma for Michael 
Freedman’s classification theorem about topological 
four-dimensional manifolds describes ubiquitous cir- 
cumstances where a topologically (but not smoothly!) 
embedded disk can be found for use in the Whitney 
trick. 


3 How Geometry Enters the Fray 

Much of our current understanding about smooth man- 
ifolds in dimensions 4 or less has come via what 
might be called geometric techniques. The search for a 
canonical geometric structure on a given three-dimen- 
sional manifold is an example. Perelman’s proof of the 
geometrization theorem proceeds in this manner. The 
idea is to choose any convenient geometric structure 
on a given three-dimensional manifold and then con- 
tinuously deform it hy some well-defined rule. If one 
views the deformation as a time-dependent process, 
then the goal is to design the deformation rule to make 
the geometric structure ever more symmetric as time 
goes on. 

A rule introduced and much studied by Richard 
Hamilton and then used by Perelman specifies the time- 
derivative of the geometric structure at any given time 
in terms of certain of its properties at that time. It is 
a nonlinear version of the classical heat equation 
[1.3 §5.4]. For those unfamiliar with the latter, the sim- 
plest version modifies funetions on the real line and 
will now be described. Let t denote the time parame- 
ter, and let f(x) denote a given funetion on the line, 
representing the initial distribution of heat. The result- 
ing time-dependent family of funetions associates with 
any given positive value for t a funetion, F T (x), which 
represents the distribution of heat at time t. The partial 
derivative of F T (x) with respect to t is equal to its sec- 
ond partial derivative with respect to x, and the initial 
condition is that Fq(x) = f(x). If the initial funetion f 
is zero outside some interval, then one can write down 
a formula for F T : 

FT(x) -'{2rrt)i/2 £ e- ix -y? l2T f(y) dy. (2) 
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One can see from (2) that F T (x) tends uniformly to zero 
in x as t tends to infinity. In particular, this limit is 
completely ignorant of the starting function /; and, 
being identically zero, it is also the most symmetric 
function possible. The representation for F T in (2) indi- 
cates how this comes about. The value of F T at any given 
point is a weighted average of the values of the original 
function. Moreover, as t increases, this average looks 
more like the standard average over ever-larger regions 
of the line. Physically this is very plausible as well: the 
heat spreads itself out more and more thinly as time 
goes on. 

The time-dependent family of geometric structures 
that Hamilton introduced and Perelman used is defined 
by an equation that relates the time-derivative of the 
geometric structure at any given time to its Ricci cur- 
vature, a certain natural substitute in the context of 
geometric structures for the second derivatives that 
enter the heat equation for the functions F T above. The 
idea much studied by Hamilton and then by Perelman is 
to let the evolving geometric structure decompose the 
manifold into the canonical pieces that are predicted 
to exist by the geometrization conjecture. Perelman 
proved that the pieces required by the geometrization 
conjecture emerge as regions whose points stay rela- 
tively close together (as measured by a certain rescal- 
ing of the distance function) while the points in distinet 
regions move farther and farther apart. 

The equation used by Perelman and Hamilton for the 
time-evolution of a geometric structure is rather com- 
plicated. Its standard incarnation involves the notion 
of a riemannian metric [1.3 §6.10]. This appears in 
any given coordinate chart on an n-dimensional man- 
ifold as a symmetric, positive-definite n x n matrix 
whose entries are functions of the coordinates. The var- 
ious components of this matrix are traditionally written 
as {øyhs'i.jxn- The matrix determines the geometric 
structure and can in turn be derived from it. 

Hamilton and Perelman study a time-dependent fam- 
ily of Riemannian metrics, t -> g T , where the rule 
for the time dependence is obtained using an equa- 
tion for the T-derivative of g r that has the schematic 
form d T (g T )ij = -2 Rij[g T \, where are 

the components of the aforementioned Ricci curva- 
ture, a certain symmetric matrix that is determined at 
any given t by the metric g T ■ Every Riemannian met- 
ric has a Ricci curvature; its components are standard 
(nonlinear) functions of the components of the matrix 
and their first- and second-order partial derivatives in 
the coordinate directions. The Ricci curvatures for the 


metrics that define the respective Euclidean, spherical, 
and hyperbolic geometries have the particularly simple 
form Rij = egij, where c is 0, 1, or -1, respectively. For 
more about these ideas, see ricci flow [III.80]. 

As was mentioned at the beginning of this part of the 
article, geometry has also played a central role in the 
developments in the classification program for smooth, 
four-dimensional manifolds. In this case, geometrically 
defined data are used to distinguish smooth structures 
on topologically equivalent manifolds. What follows is 
a very brief sketch of how this is done. 

To begin with, the idea is to introduce a geometric 
structure on the manifold and then to use the latter to 
define a canonical system of partial differential equa- 
tions. In any given coordinate chart, these equations are 
for a particular set of functions. The equations State 
that certain linear combinations of the collection of 
first derivatives of the functions from the set are equal 
to terms that are linear and quadratic in the values of 
the functions themselves. In the case of the Donald- 
son invariants, and also of the newer Seiberg-Witten 
invariants, the relevant equations are nonlinear gener- 
alizations of the maxwell equations [IV.13 §1.1] for 
electricity and magnetism. 

In any event, one then counts the solutions with alge- 
braic weights. The purpose of the algebraic weighting 
of the count is to obtain an invariant [1.4 §2.2], that 
is, a count that does not change if the given geomet- 
ric structure is changed. The point here is that the 
naive count will typically depend on the structure, but a 
suitably weighted count will not. Imagine, for example, 
that one has a continuously varying family of geomet- 
ric structures, and that new solutions appear and old 
ones disappear only in pairs, where one solution has 
been assigned weight +1 and the other -1. 

The following toy model illustrates this appearance 
and disappearance phenomenon. The equation in ques- 
tion is for a single function on the circle. That is, it will 
concern a function, /, of one variable, x, that is peri- 
odic with period 2rr. For example, take the equation 
df/dx + Tf - f 3 = 0, where r is a constant that is 
specified in advance. Varying t can now be viewed as 
a model for the variation of the geometric structure. 
When t > 0 there are exaetly three solutions: / m. 0, 
T.atid/ = -T.However,whenT ^ 0, the only solu- 
tion is 0. Thus, the number of solutions changes 
as t crosses zero. Even so, a suitable weighted count is 
independent of t. 

Let us return now to the four-dimensional story. If 
the weighted sum is independent of the chosen geo- 
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metric structure, then it depends only on the underly- 
ing smooth structure. Therefore, if two geometric struc- 
tures on a given topological manifold provide distinet 
sums, then the underlying smooth structures must be 
distinet. 

As I remarked earlier, Oszvath and Szabo have 
defined invariants for four-dimensional manifolds that 
are easier to use than the Seiberg-Witten invariants, but 
probably equivalent to them. These are also defined as 
the number of solutions to a particular system of differ- 
ential equations, counted in a Creative way. In this case, 
the equations are analogues of the cauchy-riemann 
equations [1.3 §5.6], and the arena is a space that can 
be defined after cutting the 4-manifold into simpler 
pieces. There are myriad ways to slice a 4-manifold 
in the prescribed manner, but a suitably Creative, alge- 
braic count of solutions provides the same number for 
each. 

With hindsight, one can see that the use of differ- 
ential equations to distinguish smooth structures on a 
given topological manifold makes good sense, since a 
smooth structure is needed to take a derivative in the 
first place. Even so, this author is constantly amazed by 
the faet that the Donaldson/Seiberg-Witten/Oszvath- 
Szabo strategy of algebraically counting differential 
equation solutions yields counts that are both tractable 
and useful. (Getting the same count in all cases is no 
help at all.) 

Further Reading 

Those who wish to learn more about manifolds in gen- 
eral can consult J. Milnor’s book Topology from the 
Differentiable Viewpoint (Princeton University Press, 
Princeton, NJ, 1997) or the book Differential Topol- 
ogy (Prentice Hall, Englewood Cliffs, NJ, 1974), by 

V. Guillemin and A. Pollack. A good introduction to 
the classification problem in dimensions 2 and 3 is 
the book Three-Dimensional Geometry and Topology 
(Princeton University Press, Princeton, NJ, 1997), by 

W. Thurston. This book also has a nice discussion 
of geometric structures. A full account of Perelman’s 
proof of the Poincaré conjecture can be found in Ricci 
Flow and the Poincaré Conjecture, by J. Morgan and 
G. Tian (American Mathematical Society, Providence, RI, 
2007). The story for topological 4-manifolds is told in 
the book by M. Freedman and F. Quinn titled Topology 
of 4-Manifolds (Princeton University Press, Princeton, 
NJ, 1990). There are no hooks available that serve as 
general introductions to the smooth 4-manifold story. 


A book that does introduce the Seiberg-Witten invari- 
ants is The Seiberg-Witten Equations and Applications 
to the Topology of Smooth Four-Manifolds (Princeton 
University Press, Princeton, NJ, 1995), by J. Morgan. 
Meanwhile, the Donaldson invariants are discussed in 
detail in the book by Donaldson and P. Kronheimer 
titled Geometry of Four-Manifolds (Oxford University 
Press, Oxford, 1990). Finally, parts of the story for 
dimensions greater than 4 are told in Lectures on 
the h-Cobordism Theorem (Princeton University Press, 
Princeton, NJ, 1965), by J. Milnor, and Foundational 
Essays on Topological Manifolds, Smoothings and Tri- 
angulations (Princeton University Press, Princeton, NJ, 
1977), by R. Kirby and L. Siebenman. 


IV.8 Moduli Spaces 

David D. Ben-Zvi 


Many of the most important problems in mathemat- 
ics concern classification [1.4 §2]. One has a class of 
mathematical objects and a notion of when two objects 
should count as equivalent. It may well be that two 
equivalent objects look superficially very different, so 
one wishes to describe them in such a way that equiva- 
lent objects have the same description and inequivalent 
objects have different descriptions. 

Moduli spaces can be thought of as geometric solu- 
tions to geometric classification problems. In this arti- 
cle we shall illustrate some of the key features of mod- 
uli spaces, with an emphasis on the moduli spaces of 
riemann surfaces [III.81]. In broad terms, a moduli 
problem consists of three ingredients. 

Objects: which geometric objects would we like to 
describe, or parametrize ? 

Equivalences: when do we identify two of our objects 
as being isomorphic, or “the same”? 

Families: how do we allow our objects to vary, or 
modulate? 

In this article we will discuss what these ingredients sig- 
nify, as well as what it means to solve a moduli problem, 
and we will give some indications as to why this might 
be a good thing to do. 

Moduli spaces arise throughout algebraic geom- 
etry [TV.4], differential geometry, and algebraic 
topology [IV.6]. (Moduli spaces in topology are often 
referred to as classifying spaces.) The basic idea is to 
give a geometric structure to the totality of the objects 
we are trying to classify. If we can understand this geo- 
metric structure, then we obtain powerful insights into 
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the geometry of the objects themselves. Furthermore, 
moduli spaces are rich geometric objects in their own 
right. They are “meaningful” spaces, in that any state- 
ment about their geometry has a “modular” interpreta- 
tion, in terms of the original classification problem. As a 
result, when one investigates them one can often reach 
much further than one can with other spaces. Moduli 
spaces such as the moduli of elliptic curves [III.21] 
(which we discuss below) play a central role in a vari- 
ety of areas that have no immediate link to the geom- 
etry being classified, in particular in algebraic num- 
ber theory [IV. 1] and algebraic topology. Moreover, 
the study of moduli spaces has benefited tremendously 
in recent years from interactions with physics (in par- 
ticular with string theory [IV. 1 7 §2]). These interac- 
tions have led to a variety of new questions and new 
techniques. 

1 Warmup: The Moduli Space 
of Lines in the Plane 

Let us begin with a problem that looks rather simple, 
but that nevertheless illustrates many of the important 
ideas of moduli spaces. 

Problem. Describe the collection of all lines in the real 
plane E^ that pass through the origin. 

To save writing, we are using the word “line” to mean 
“line that passes through the origin.” This classification 
problem is easily solved by assigning to each line L an 
essential parameter, or modulus , a quantity that we can 
calculate for each line and that will help us tell different 
lines apart. All we have to do is take standard Cartesian 
coordinates x,y on the plane and measure the angle 
9(L) between the line L and the x-axis, taken in coun- 
terclockwise fashion. We find that the possible values 
of 0 are those for which 0 < 0 < tt, and that for every 
such 0 there is exactly one line L that makes an angle 
of 0 with the x-axis. So as a set, we have a complete 
solution to our classification problem: the set of lines 
L, known as the real projective line EP 1 , is in one-to-one 
correspondence with the half-open interval [0, tt). 

However, we are seeking a geometric solution to the 
classification problem. What does this entail? We have 
a natural notion of when two lines are near each other, 
which our solution should capture— in other words, the 
collection of lines has a natural topology [III.92]. So 
far, our solution does not reflect the faet that lines 
L for which the angle 0(L) is close to tt are almost 
horizontal: they are therefore close to the x-axis (for 


which 0 = 0) and to the lines L with 0(L) close to zero. 
We need to find some way of “wrapping around” the 
interval [0, tt) so that tt becomes close to 0. 

One way to do this is to take not the half-open inter- 
val [0, tt) but the closed interval [0, tt], and then to 
“identify” the points 0 and tt. (This idea can easily be 
made formal by defining an appropriate equivalence 
relation [1.2 §2.3].) If tt and 0 are regarded as the 
same, then numbers close to tt are close to numbers 
close to 0. This is a way of saying that if you attach the 
two ends of a line segment together, then, topologically 
speaking, you obtain a circle. 

A more natural way of achieving the same end is sug- 
gested by the following geometric construction of KP 1 . 
Consider the unit circle S 1 cl 2 . To each point s g S 1 , 
there is an obvious way of assigning a line L(s): take 
the line that passes through s and the origin. Thus, we 
have a family of lines parametrized by S 1 , that is, a map 
(or funetion) 5 — L(s) that takes points in S 1 to lines 
in our set EP 1 . What is important about this is that we 
already know what it means for two points in 5 1 to be 
close to each other, and the map 5 ■= L(s) is continu- 
ous. However, this map is a two-to-one funetion rather 
than a bijection, since 5 and -5 always give the same 
line. To remedy this, we can identify each s in the cir- 
cle S 1 with its antipodal point -5. We then have a one- 
to-one correspondence between EP 1 and the resulting 
quotient space [1.3 §3.3] (which again is topologically 
a circle), and this correspondence is continuous inboth 
directions. 

The key feature of the space EP 1 , considered as the 
moduli space of lines in the plane, is that it captures 
the ways in which lines can modulate, or vary continu- 
ously in families. But when do families of lines arise? 
A good example is provided by the following construc- 
tion. Whenever we have a continuous curve C c E 2 \ 0 
in the plane, we can assign to each point c in C the line 
I(c) that passes through 0 and c. This gives us a family 
of lines parametrized by C. Moreover, the funetion that 
takes c to L (c) is a continuous funetion from C to EP 1 , 
so the parametrization is a continuous one. 

Suppose, for example, that C is a copy of R realized as 
the set of points (x, 1) at height 1. Then the map from 
C to RP 1 gives an isomorphism between E and the set 
{L : 9(L) * 0}, which is the subset of EP 1 consisting 
of all lines apart from the x-axis. Put more abstraetly, 
we have an intuitive notion of what it means for a col- 
lection of lines through the origin to depend continu- 
ously on some parameters, and this notion is captured 
precisely by the geometry of EP 1 : for instance, if you 
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tell me you have a continuous 37-parameter family of 
lines in R 2 , this is the same as saying that you have a 
continuous map from R 37 to RP 1 , which sends a point 
v e R 37 to a line L(v) e rp 1 . (More concretely, we 
could say that the real function v — 9(L(v)) on R 37 
is continuous away from the locus where 0 is close to 
tt. Near this locus we could use instead the function f 
that measures the angle from the y-axis.) 

1.1 Other Families 

The idea of families of lines leads to various other geo- 
metric structures on the space RP 1 , and not just its 
topological structure. For example, we have the notion 
of a differentiable family of lines in the plane, which 
is a family of lines for which the angles vary differen- 
tiably. (The same ideas apply if we replace “differen- 
tiable” by “measurable,” “C“,” “real analytic,” etc.) To 
parametrize such a family appropriately, we would like 
RP 1 to be a DIFFERENTIABLE MANIFOLD [1.3 §6.9], SO that 
we can calculate derivatives of functions on it. Such a 
structure on RP 1 can be specified by using the angle 
functions 0 and <p defined in the previous section. The 
function 0 gives us a coordinate for lines that are not 
too close to the x-axis, and <p gives us a coordinate for 
lines that are not too close to the j'-axis. We can cal- 
culate derivatives of functions on RP 1 by writing them 
in terms of these coordinates. One can justify this dif- 
ferentiable structure on RP 1 by checking that for any 
differentiable curve C c R 2 \ 0 the map c ■-* L(c) comes 
out as differentiable. This means that if I (c) is not close 
to the x-axis, then the function x — 9(L(x)) is differ- 
entiable at x = c, and similarly for 4> and the y-axis. 
The functions x - 9(L(x)) and — <p(L(x)) are called 
pullbacks, because they are the result of converting, or 
“pulling back,” 9 and <l> from functions defined on RP 1 
to functions defined on C. 

We now can State the fundamental property of RP 1 
as a differentiable space. 

A differentiable family of lines in R 2 parametrized by a 
differentiable manifold X is the same thing as a func- 
tion from X to RP 1 , taking a point x to a line L(x), 
such that the pullbacks x -* 9(L(x)) andx <ML(x)) 
of the functions 9, cf> are differentiable functions. 

We say that RP 1 (with its differentiable structure) is 
the moduli space of (differentiably varying families of) 
lines in R 2 . This means that RP 1 carries the universal 
differentiable family of lines. From the very definition, 
we have assigned to each point of RP 1 a line in R 2 , and 


these lines vary differentiably as we vary the point. The 
above assertion says that any differentiable family of 
lines, parametrized by a space X, is described by giv- 
ing a map J' : X - RP 1 and assigning tox£l the 
line I(/(x)). 

1.2 Reformulation: Line Bundles 

It is interesting to reformulate the notion of a (continu- 
ous or differentiable) family of lines as follows. Let X be 
a space and let x >- Lix) be an assignment of lines to 
points in X. For each point xeX.we place a copy of R 2 
at x; in other words, we consider the Cartesian product 
X x R 2 . We may now visualize the line I(x) as living in 
the copy of R 2 that Ues over x. This gives us a contin- 
uously varying collection of lines L(x) parametrized 
by x g X, otherwise known as a line bundle over X. 
Moreover, this line bundle is embedded in the “trivial” 
vector bundle [IV.6 §5] X x R 2 , which is the constant 
assignment that takes each x to the plane R 2 . In the 
case when X is RP 1 itself, we have a “tautological” line 
bundle: to each point s e RP 1 , which we can think of as 
a Une L s in R 2 , it assigns that very same line L s . 

Proposition. For any topological space X there is a 
natural bijection between the following two sets: 

(i) the set of continuous functions f:X-> RP 1 ; and 

(ii) the set of line bundles on X that are contained in 
the trivial vector bundle XxR 2 . 

This bijection sends a function / to the correspond- 
ing puhback of the tautological line bundle on RP 1 . 
That is, the function / is mapped to the line bundle 
x — Lf(x). (This is a pullback because it converts L 
from a function defined on RP 1 to a function defined 
on X.) 

Thus, the space RP 1 carries the universal line bun- 
dle that sits in the trivial R 2 bundle — any time we have 
a line bundle sitting in the trivial R 2 bundle, we can 
obtain it by pulling back the universal (tautological) 
example on RP 1 . 

1.3 Invariants of Families 

Associated with any continuous function / from the 
circle 5 1 to itself is an integer known as its degree. 
Roughly speaking, the degree of / is the number of 
times f{x) goes around the circle when x goes around 
once. (If it goes backwards n times, then we say that the 
degree is — n.) Another way to think of the degree is as 
the number of times a typical point in S 1 is passed by 
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fix) as x goes around the circle, where we count this 
as +1 if it is passed in the counterclockwise direction 
and - 1 if it is passed in the clockwise direction. 

Earlier, we showed that the circle S 1 , which we 
obtained by identifying the endpoints of the closed 
interval [0, rr], could be used to parametrize the mod- 
uli space RP 1 of lines. Combining this with the notion 
of degree, we can draw some interesting conclusions. 
In particular, we can define the notion of winding num- 
bers. Suppose that we are given a continuous function 
y from the circle S 1 into the plane IR 2 and suppose that 
it avoids 0. The image of this map will be a closed loop 
C (which may cross itself). This defines for us a map 
from S 1 to itself: first do y to obtain a point c in C, 
then work out i (c), which belongs to RP 1 , and finally 
use the parametrization of RP 1 to associate with I(c) a 
point in S 1 again. The degree of the resulting compos- 
ite map will be twice the number of times that y, and 
hence C, winds around 0, so half this number is defined 
to be the winding number of y. 

More generally, given a family of lines in R 2 parame- 
trized by some space X, we would like to measure the 
“manner in which X winds around the circle.” To be pre- 
cise, given a function </> from X to RP 1 , which defines 
the parametrized family of lines, we would like to be 
able to say, for any map f : S 1 ^ X, what the wind- 
ing number is of the composition 4>f, which takes a 
point x in S 1 to its image /(x) in X and from there 
to the corresponding line <p(f(x)) in the family. Thus, 
the map 4> gives us a way of assigning to each func- 
tion f : S 1 — X an integer, the winding number of 
4>f. The way this assignment works does not change if 
f/j is continuously deformed: that is, it is a topological 
invariant of 4>- What it does depend on is the class that 
4 > belongs to in the first cohomology group [IV.6 §4] 
of X, H^X.Z). Equivalently, to any line bundle on a 
space X which is contained in the trivial R 2 -bundle, 
we have associated a cohomology class, known as the 
Euler class of the bundle. This is the first example of 
a characte ristic class [IV.6 §5] for vector bundles. 
It demonstrates that if we understand the topology of 
moduli spaces of classes of geometric objects, then we 
can define topological invariants for families of those 
objects. 


2 The Moduli of Curves 
and Teichmiiller Spaces 

We now turn our attention to perhaps the most famous 
examples of moduli spaces, the moduli spaces of 


curves, and their first cousins, the Teichmiiller spaces. 

These moduli spaces are the geometric solution to the 
problem of classification of compact Riemann surfaces, 
and can be thought of as the “higher theory” of Rie- 
mann surfaces. The moduli spaces are “meaningful 
spaces,” in that each of their points stands for a Rie- 
mann surface. As a result, any statement about their 
geometry tells us something about the geometry of 
Riemann surfaces. 

We turn first to the objects. Recall that a Riemann 
surface is a topological surface X (connected and ori- 
ented) to which a complex structure has been given. 

Complex structures can be described in many ways, 
and they enable us to do complex analysis, geometry, 
and algebra on the surface X. In particular, they enable 
us to define holomorphic [1.3 §5.6] (complex-analytic) 
and meromorphic functions [V.34] on open subsets 
of X. To be precise, X is a two-dimensional manifold, 
but the charts are thought of as open subsets of C 
rather than of R, and the maps that glue them together 
are required to be holomorphic. An equivalent notion 
is that of a conformal structure on X, which is the 
structure needed to make it possible to define angles 
between curves in X. Yet another important equivalent 
notion is that of algebraic structure on X, making X 
into a complex-algebraic curve (leading to the persis- 
tent confusion in terminology: a Riemann surface is two 
dimensional, and therefore a surface, from the point of 
view of topology or the real numbers, but one dimen- 
sional, and therefore a curve, from the point of view of 
complex analysis and algebra). An algebraic structure is 
what allows us to speak of polynomial, rational, or alge- 
braic functions on X, and is usually specified by real- 
izing X as the set of solutions to polynomial equations 
in complex projective space [III.74] CP 2 (or CP n ). 

In order to speak of a classification problem, let alone 
a moduli space, for Riemann surfaces we must next 
specify when we regard two Riemann surfaces as equiv- 
alent. (We postpone the discussion of the final ingre- 
dient, the notion of families of Riemann surfaces, to 
section 2.2.) To do this, we must give a notion of iso- 
morphism between Riemann surfaces: when should two 
Riemann surfaces X and Y be “ identified .” or thought P|,P: 'iciemifi«r is 
of as giving two equivalent realizations of the same otjargonandis 
underlying object of our classification? This issue was comfctiy m that 
hidden in our toy example of classifying lines in the Addition of quote 
plane: there we simply identified two lines if and only midgatcs'thc' 1 "' 
if they were equal as lines in the plane. This naive problem? 
option is not available to us with the more abstractly 
defined Riemann surfaces. If we considered Riemann 
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surfaces realized concretely as subsets of some larger 
space— for example, as solution sets to algebraic equa- 
tions in complex projective space— we could similarly 
choose to identify surfaces only if they were equal as 
subsets. However, this is too fine a classification for 
most applications: what we care about is the intrinsic 
geometry of Riemann surfaces, and not incidental fea- 
tures that result from the particular way we choose to 
realize them. 

At the other extreme, we might choose to ignore the 
extra geometric structure that makes a surface into a 
Riemann surface. That is, we could identify two Rie- 
mann surfaces X and Y if they are topologically equiva- 
lent, or fiomeomorphic (the “coffee mug is a doughnut” 
perspective). The classification of compact Riemann 
surfaces up to topological equivalence is captured by a 
single positive integer, the genus g (“number of holes”) 
of the surface. Any surface of genus zero is homeomor- 
phic to the Riemann sphere CP 1 - S 2 , any surface of 
genus 1 is homeomorphic to a torus S 1 x S 1 , and so on. 
Thus, in this case there is no issue of “modulation”— 
the classification is solved by giving a list of possible 
values of a single discrete invariant. 

However, if we are interested in Riemann surfaces 
as Riemann surfaces rather than simply as topological 
manifolds, then this classification is too crude: it com- 
pletely ignores the complex structure. We would now 
like to rehne our classification to remedy this defect. To 
this end, we say that two Riemann surfaces X and Y are 
(conformally, or holomorphically) equivalent if there is 
a topological equivalence between them that preserves 
the geometry, i.e., a homeomorphism that preserves the 
angles between curves, or takes holomorphic functions 
to holomorphic functions, or takes rational functions 
to rational functions. (These conditions are all equiv- 
alent.) Note that we still have at our disposal our dis- 
crete invariant— the genus of a surface. However, as we 
shall see, this invariant is not fine enough to distinguish 
between all inequivalent Riemann surfaces. In faet, it is 
possible to have families of inequivalent Riemann sur- 
faces that are parametrized by continuous parameters 
(but we cannot make proper sense of this idea until we 
have said precisely what is meant by a family of Rie- 
mann surfaces). Thus, the next step is to fix our discrete 
invariant and to try to classify all the different isomor- 
phism classes of Riemann surfaces with the same genus 
by assembling them in a natural geometric fashion. 

An important step toward this classification is the 
uniformization theorem [V.37]. This States that any 
simply connected Riemann surface is holomorphically 


isomorphic to one of the following three: the Riemann 
sphere CP 1 , the complex plane C, or the upper half- 
plane H (equivalently, the unit disk D). Since the uni- 
versal covering space [III.95] of any Riemann sur- 
face is a simply connected Riemann surface, the uni- 
formization theorem provides an approach to clas- 
sifying arbitrary Riemann surfaces. For instance, any 
compact [III.9] Riemann surface of genus zero is sim- 
ply connected, and in faet homeomorphic to the Rie- 
mann sphere, so the uniformization theorem already 
solves our classification problem in genus zero: up to 
equivalence, CP 1 is the only Riemann surface of genus 
zero, and so in this case the topological and conformal 
classifications agree. 

2.1 Moduli of Elliptic Curves 

Next, we consider Riemann surfaces whose universal 
cover is C, which is the same as saying that they are 
quotients of C. For example, we can look at a quotient 
of C by Z, which means that we regard two complex 
numbers z and w as equivalent if z - w is an integer. 
This has the effeet of “wrapping C around” into a cylin- 
der. Cylinders are not compact, but to get a compact 
surface we could take a quotient by z 2 instead: that 
is, we could regard z and w as equivalent if their dif- 
ference is of the form a + bi, where a and b are both 
integers. Now C is wrapped around in two directions 
and the result is a torus with a complex (or, equiva- 
lently, conformal or algebraic) structure. This is a com- 
pact Riemann surface of genus 1. More generally, we 
can replace z 2 by any lattice L, regarding z and w as 
equivalent if z - w belongs to L. (A lattice I. in C is an 
additive subgroup of C with two properties. First, it is 
not contained in any line. Second, it is discrete, which 
means that there is a constant d > 0 such that the dis- 
tance between any two points in I is at least d. Lattices 
are also discussed in the general goals of mathe- 
matical research [1.4 §4], A basis for a lattice L is a 
pair of complex numbers u and v belonging to L such 
that every z in L can be written in the form au + bv 
with a and b integers. Such a basis will not be unique: 
for example, if L = zæz, then the obvious basis is u = 1 
and v = i, but u = 1 and v = 1 + i would do just as 
well.) If we take a quotient of C by a lattice, then we 
again obtain a torus with complex structure. It turns 
out that any compact Riemann surface of genus 1 can 
be produced in this way. 

From a topological point of view, any two tori are the 
same, but once we consider the complex structure we 
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start to find that different choices of lattice may lead 
to different Riemann surfaces. Certain changes to I do 
not have an effect: for example, if we multiply a lat- 
tice I by some nonzero complex number A, then the 
quotient surface C/I will not be affected. That is, C/I 
is naturally isomorphic to C/AI. Therefore, we need 
only worry about the difference between lattices when 
one is not a multiple of the other. Geometrically, this 
says that one cannot be obtained from the other by a 
combination of rotation and dilation. 

Notice that by taking the quotient C/I we obtain not 
just a “naked” Riemann surface, but one equipped with 
an “origin,” that is, a distinguished point cel, which 
is the image of the origin 0 g C. In other words, we 
obtain an elliptic curve: 

Definition. An elliptic curve (over C) is a Riemann sur- 
face E of genus 1, equipped with a marked point e g E. 
Elliptic curves, up to isomorphism, are inbijection with 
lattices I c C up to rotation. 

Remark. In faet, since I c C is a subgroup of the Abe- 
lian group C, the elliptic curve E = C /I is naturally an 
Abelian group, with e as its identity element. This is 
an important motivation for keeping e as part of the 
data that defines an elliptic curve. A more subtle rea- 
son for remembering the location of e when we speak 
of E is that it helps us to define E more uniquely. This 
is useful, because any surface E of genus 1 has lots 
of symmetries, or automorphisms [1.3 §4.1]: there is 
always a holomorphic automorphism of E taking any 
point x to any other given point y. (If we think of E 
as a group, these are achieved by translations.) Thus, 
if someone hånds us another genus- 1 surface I', there 
may be no way to identify E with £', or there may be 
infinitely many ways: we can always compose a given 
isomorphism between them with a self-symmetry of E. 
As we will discuss later, automorphisms haunt almost 
every moduli problem, and are crucial when we con- 
sider the behavior of families. It is usually convenient 
to “rigidify” the situation somewhat, so that the pos- 
sible isomorphisms between different objects are less 
“floppy” and more uniquely determined. In the case of 
elliptic curves, distinguishing the point e achieves this 
by reducing the symmetry of E. Once we do that, there 
is usually at most one way to identify two elliptic curves 
(one way, that is, that takes origin to origin). 

We see that Riemann surfaces of genus 1 (with the 
choice of a marked point) can be described by concrete 
“linear algebra data”: a lattice I c C, or rather the equiv- 
alence class consisting of all nonzero scalar multiples 


AI of I. This is the ideal setting to study a classifica- 
tion, or moduli, problem. The next step is to find an 
explicit parametrization of the collection of all lattices, 
up to multiplication, and to decide in what sense we 
have obtained a geometric solution to the classification 
problem. 

In order to parametrize the collection of lattices, we 
follow a procedure used for all moduli problems: first 
parametrize lattices together with the choice of some 
additional structure, and then see what happens when 
we forget this choice. For every lattice I we choose a 
basis coi,a >2 e E: that is, we represent I as the set 
of all integer combinations acoi + b to 2 . We do this in 
an oriented fashion: we require that the fundamental 
parallelogram spanned by coi and C 02 is positively ori- 
ented. (That is, the numbers 0, coi , coi + C 02 , and w 2 list 
the vertices of the parallelogram in a counterclockwise 
order. From the geometric point of view of the elliptic 
curve E, L is the fundamental group [IV.6 §2] of E, 
and the orientation condition says that we generate I 
by two loops, or “meridians,” A = coi, B = C 02 , which 
are oriented, in that their oriented intersection num- 
ber A n B is equal to +1 rather than -1.) Since we are 
interested in lattices only up to multiplication, we can 
multiply I by a complex number so as to turn coi into 
1 and hence C 02 into co = CO 2 /CO 1 . The orientation con- 
dition now says that w is in the upper half-plane H: i.e., 
its imaginary part is positive, Im w > 0. Conversely, any 
complex number co g H in the upper half-plane deter- 
mines a unique oriented lattice I = Z1 ® Zco (that is, 
the set of all integer combinations a + bæ of 1 and co) 
and no two of these lattices are related by a rotation. 

What does this tell us about elliptic curves? We saw 
earlier that an elliptic curve is defined by a lattice I and 
an identity e. Now we have seen that if we give I some 
extra structure, namely an oriented basis, then we can 
parametrize it by a complex number co g H. This makes 
precise for us the “additional structure” that we want 
to place on elliptic curves. We say that a marked elliptic 
curve is an elliptic curve E, e together with the choice 
of an oriented basis coi,co 2 for the associated lattice 
(fundamental group) I of E. The point is that any lattice 
has infinitely many different bases, which lead to many 
automorphisms of E. By “marking” one of these bases, 
we stop them being automorphisms. 

2.2 Families and Teichmiiller Spaces 

With our new definition, we can summarize the earlier 
discussion by saying that marked elliptic curves are in 
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bijection with points ra g lof the upper half-plane. 
The upper half-plane is, however, much more than just 
a set of points: it carries a host of geometric structures, 
in particular a topology and a complex structure. In 
what sense do these structures refiect geometric prop- 
erties of marked elliptic curves? In other words, in what 
sense is the complex manifold H, known in this context 
as the Teichmuller space Tj,i of genus- 1 Riemann sur- 
faces with one marked point, a geometric solution to 
the problem of classifying marked elliptic curves? 

In order to answer this question, we need the notion 
of a continuous family of Riemann surfaces, and also 
the notion of a complex-analytic family. A continuous 
family of Riemann surfaces parametrized by a topolog- 
ical space 5, such as the circle Si , for example, is a “con- 
tinuously varying” assignment of a Riemann surface X s 
to every point s of S. In our example of the moduli 
of lines in the plane, a continuous family of lines was 
characterized by the property that the angles between 
the lines and the x-axis or y-axis defined continuous 
functions of the parameters. Geometrically defined col- 
lections of lines, such as those produced by a curve 
C in the plane, then gave rise to continuous families. 
More abstractly, a continuous family of lines defined a 
line bundle over the parameter space. A good criterion 
for a family of Riemann surfaces is likewise that any 
“reasonably defined” geometric quantity that we can 
calculate for every Riemann surface should vary con- 
tinuously in the family. For example, a classical con- 
struction of Riemann surfaces of genus g comes from 
taking 4<j-gons and gluing opposite sides together. The 
resulting Riemann surface is fully determined by the 
edge-lengths and angles of the polygon. Therefore, a 
continuous family of Riemann surfaces described in 
this fashion should be precisely a family such that the 
edge-lengths and angles give continuous functions of 
the parameter set. 

In more abstract topological terms, if we have a col- 
lection {X s , s g S) of Riemann surfaces depending 
on points in a space S and we wish to make it into 
a continuous family, then we should give the union 
Uses X s itself the structure of a topological space X, 
which should simultaneously extend the topology on 
each individual X s . The result is called a Riemann sur- 
face bundle. Associated with X is the map that takes 
each point x to the particular 5 for which x belongs to 
X s .We should demand that this map is continuous, and 
perhaps more (it could be a fibration, or fiber bundle). 
This definition has the advantage of great flexibility. 
For example, if S is a complex manifold, then in just 


the same way we can speak of a complex-analytic fam- 
ily of Riemann surf aces {X s , s g S} parametrized by S: 
now we ask for the union of the X s to carry not just a 
topology but a complex structure (i.e., it should form 
a complex manifold), extending the complex struc- 
ture on the fibers and mapping holomorphically to 
the parameter set. The same holds with “complex- 
analytic” replaced by “algebraic.” These abstract def- 
initions have the property that if our Riemann sur- 
faces are described in a concrete way— cut out by equa- 
tions, glued from coordinate patches, etc. — then the 
coefficients of our equations or gluing data will vary 
as complex-analytic functions in our family precisely 
when the family is complex analytic (and likewise for 
continuous or algebraic families). 

As a reality check, note that a (continuous, analytic, 
or other) family of Riemann surfaces parametrized by 
a single point s = S is indeed just a single Riemann 
surface X s . Just as in this simple case we wish to con- 
sider Riemann surfaces only up to equivalence, so there 
is a notion of equivalence or isomorphism of two ana- 
lytic families {X 5 } and {X' s } parametrized by the same 
space S. We simply regard the families as equivalent if 
the surfaces X s and X' s are isomorphic for every s, and 
if the isomorphism depends analytically on 5. 

Armed with the notion of family, we can now for- 
mulate the characteristic property that the upper half- 
plane possesses when we think of it as the moduli space 
of marked elliptic curves. We define a continuous or 
analytic family of marked elliptic curves to be a fam- 
ily where the underlying genus- 1 surfaces vary contin- 
uously or analytically, while the choice of basepoint 
e s g E s and the basis of the lattice L s vary continuously. 

The upper half-plane H plays a role for marked ellip- 
tic curves that is similar to the role played by EP 1 for 
lines in the plane. The following theorem makes this 
statement precise. 

Theorem. For any topological space S, there is a one- 
to-one correspondence between continuous maps from 
S to H and isomorphism classes of continuous families 
of marked elliptic curves parametrized by S. Similarly, 
there is a one-to-one correspondence between analytic 
maps from any complex manifold S to H and isomor- 
phism classes of analytic families of marked elliptic 
curves parametrized by S. 

If we apply the theorem in the case where 5 is a single 
point, it simply tells us that the points of H are in bijec- 
tion with the isomorphism classes of marked elliptic 
curves, as we already knew. However, it contains more 
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information: it says that H, with its topology and com- 
plex structure, embodies the structure of marked ellip- 
tic curves and the ways in which they can modulate. At 
the other extreme, we could take S = E itself, mapping 
S to H by the identity map. This expresses the faet that 
H itself carries a family of marked elhptic curves, i.e., 
the collection of Riemann surfaces defined by co e H fit 
together into a complex manifold fibering over H with 
elhptic curve fibers. This family is called the universal 
family, since by the theorem any family is “deduced” 
(or pulled back) from this one universal example. 

2.3 From Teichmiiller Spaces to Moduli Spaces 

We have arrived at a complete and satisfying picture 
for the classification of elhptic curves when we choose 
in addition a marking (that is, an oriented basis of the 
associated lattice I = ttj (E)). What can we say about 
elhptic curves themselves, without the choice of mark- 
ing? We somehow need to “forget” the marking, by 
regarding two points of H as equivalent if they corre- 
spond to two different markings of the same elhptic 
curve. 

Now, given any two bases of the group (or lattice) 
Z ® Z, there is an invertible 2x2 matrix with integer 
entries that takes one basis to the other. If the two bases 
are oriented, then this matrix will have determinant 1, 
which means that it is an element 

A= (c d) GSL2(Z) 

of the group of invertible unimodular matrices over 
Z. Similarly, given any two oriented bases (w 1,002) 
and (coi, co^) of a lattice L, which can be thought of 
as oriented identifications of L with z æ z, there is a 
matrix A e SL2 (Z) such that coi = aa) i + ben 2 and 
co' 2 = c coi + da>2- If we now consider the normal- 
ized bases (l,co) and (l,co'), where co = CO1/CO2 and 
co' = coi /coi, then we obtain a transformation of the 
upper half -plane. It is given by the formula 
, _ aw + b 
cco + d' 

That is, the group SL2 (Z) is acting on the upper half- 
plane by linear fractional (or Mobius) transformations 
with integer coefficients, and two points in the upper 
half -plane correspond to the same elliptic curve if one 
can be turned into the other by means of such a trans- 
formation. If this is the case, then we should regard 
the two points as equivalent: that is how we formalize 
the idea of “forgetting” the marking. Note also that the 
scalar matrix - Id in SL2 (Z), which negates both coi and 


C02, acts trivially on the upper half -plane, so that we in 
faet get an action of PSL2 (z) = SL2 (z) / { ± Id} on H. 

So we come to the conclusion that elliptic curves (up 
to isomorphism) are in bijection with orbits of PSL2(Z) 
on the upper half-plane, or equivalently with points of 
the quotient space H/ PSL2 (z). This quotient space has 
a natural quotient topology, and in faet can be given a 
complex-analytic structure, which, it turns out, identi- 
fies it with the complex plane C itself. To see this one 
uses the classical modular function [IV.l § 8 ] j(z), 
a complex-analytic function on H which is invariant 
under the modular group PSL2 (Z) and which therefore 
defines a natural coordinate H/ PSL 2 (z) — C. 

It appears that we have solved the moduli prob- 
lem for elhptic curves: we have a topological, and 
even complex-analytic, space = H/ PSL2 (z) whose 
points are in one-to-one correspondence with isomor- 
phism classes of ehiptic curves. This already qualifies 
as the coarse moduli space for elliptic curves, 
which means it is as good a moduli space as we can 
hope for. However, fails an important test for a 
moduli space that ‘T l,i passed (as we sawin section 2.2): 
it is not true, even for the circle S = S 1 , that every con- 
tinuous family of ehiptic curves over S corresponds to 
a map from S to SDti.i- 

The reason for this failure is the problem of automor- 
phisms. These are equivalences from E to itself: that is, 
complex-analytic maps from E to E that preserve the 
basepoint e. Equivalently, they are given by complex- 
analytic self-maps of C that preserve 0 and the lattice 
L. Such a map must be a rotation: that is, multiplication 
by some complex number A of modulus 1. It is easy to 
check that for most lattices L in the plane, the only rota- 
tion that sends L to itself is multiplication by A = - 1. 
Note that this is the same -1 that we quotiented out 
by to pass from SL 2 (Z) to PSL 2 (Z). However, there are 
two special lattices that have greater symmetry. These 
are the square lattice L = z ■ 1 © z ■ i, corresponding to 
the fourth root of unity i, and the hexagonal lattice 
L = z ■ 1 æ z ■ e 27ri/6 , corresponding to a sixth root of 
unity. (Note that the hexagonal lattice is also repre- 
sented by the point co = e 27ri/3 .) The square lattice, 
which corresponds to the elhptic curve formed by glu- 
ing the opposite sides of a square, has as its symmetries 
the group Z/4Z of rotational symmetries of the square. 
The hexagonal lattice, which corresponds to the ellip- 
tic curve formed by gluing the opposite sides of a reg- 
ular hexagon, has as its symmetries the group Z/6Z of 
rotational symmetries of a hexagon. 



102 


IV. Branches of Mathematics 


We see that the number of automorphisms of an ellip- 
tic curve jumps discontinuously at the special points 
co = i and co = e 27ri/6 . This already suggests that some- 
thing might be wrong with 9Jli,i as a moduli space. 
Note that we avoided this problem with the moduli 
‘T i,i of marked elliptic curves, since there are no auto- 
morphisms of an elliptic curve that also preserve the 
marking. Another place we might have observed this 
problem with 9Jli,i is when we passed to the quotient 
H/PSL 2 (Z). We avoided the automorphism A = -1 
by quotienting by PSL 2 (Z) rather than SL 2 (Z) . However, 
the two special points i and e 27ri/6 are preserved by 
integer Mobius transformations of H other than the 
identity, and they are the only points with that prop- 
erty. This means that the quotient H/ PSL 2 (Z) naturally 
comes with conical singularities at the points corre- 
sponding to these two orbits: one looks like a cone with 
angle tt, and the other like a cone with angle |tt. (To 
see why this is plausible, imagine the following simpler 
instance of the same phenomenon. If for every complex 
number z you identify z with -z, then the result is to 
wrap the complex plane around into a cone with a sin- 
gularity at 0. The reason 0 is singled out is that it is pre- 
served by the transformation z >-> -z. Here the angle 
would be tt because the Identification of points is two- 
to-one away from the singularity and tt is half of 2tt.) 
It is possible to massage these singularities away using 
the /-function, but they are indicating a basic difficulty. 

So why do automorphisms form an obstacle to the 
existence of “good” moduli spaces? We can demon- 
strate the difficulty by considering an interesting con- 
tinuous family of marked elliptic curves parametrized 
by the circle S = S 1 . Let £(i) be the “square” elliptic 
curve that we considered earlier, based on the lattice 
of integer combinations of 1 and i. Next, for every t 
between 0 and 1, let E t be a copy of £(i). Thus, we have 
taken the constant, or “trivial,” family of elliptic curves 
over the closed unit interval [0, 1], where every curve in 
the family is E (i). Now we identify the elliptic curves at 
the two ends of this family, not in the obvious way, but 
by using the automorphism given by a 90° rotation, or 
multiplication by i. This means that we are looking at 
the family of elliptic curves over the circle where each 
member of the family is a copy of the elliptic curve E (i) , 
but these copies twist by 90° as we go around the circle. 

It is easy to see that there is no way to capture this 
family of elliptic curves by means of a map from S 1 to 
the space 9Jti,i. Since all of the members of the family 
are isomorphic, each point of the circle should map to 
the same point in SXJTi,i (the equivalence class of i in 


H). But the constant map S 1 — {i} g classifies 
the trivial family S 1 x £j of elliptic curves over S 1 , that 
is, the family where every curve is equal to £(i) but 
the curves do not twist as we go around! Thus, there 
are more families of elliptic curves than there are maps 
to the quotient space H/PSL 2 (Z) cannot handle 
the complications causedby automorphisms. A variant 
of this construction applies to complex-analytic fami- 
lies with 5 1 replaced by C x . This is a very general phe- 
nomenon in moduli problems: whenever objects have 
nontrivial automorphisms, we can imitate the construc- 
tion above to get nontrivial families over an interesting 
parameter set, all of whose members are the same. As 
a result, they cannot be classified by a map to the set 
of all isomorphism classes. 

What do we do about this problem? One approach 
is to resign ourselves to having coarse moduli spaces, 
which have the right points and right geometry but do 
not quite classify arbitrary families. Another approach 
is the one that leads to Tip: we can fix markings of 
one kind or another, which “kill” all automorphisms. 
In other words, we choose enough extra structure on 
our objects so that there do not remain any (nontriv- 
ial) automorphisms that preserve all this decoration. 
In faet, one can be far more economical than picking 
a basis of the lattice L and obtaining the infinite cov- 
ering T l,i of UJtij : one can fix a basis of L only up 
to some congruence (for example, of L/2L). Finally, we 
can simply learn to come to terms with the automor- 
phisms, keeping them as part of the data, resulting in 
“spaces” where points have internal symmetries. This is 
the notion of an orbifold [IV.4 §7], or stack [IV.4 §7], 
which is flexible enough to deal with essentially all 
moduli problems. 

3 Higher-Genus Moduli Spaces 
and Teichmuller Spaces 

We would now like to generalize as mueh as possi- 
ble of the picture of elliptic curves and their mod- 
uli to higher-genus Riemann surfaces. For each g we 
would like to define a space DJl g , called the moduli 
space of curves of genus g, that classifies compact Rie- 
mann surfaces of genus g and tells us how they modu- 
late. Thus, the points of S Ul g should correspond to our 
objects, compact Riemann surfaces of genus g, or, to 
be more accurate, equivalence classes of such surfaces, 
where two surfaces are considered to be equivalent 
if there is a complex-analytic isomorphism between 
them. In addition, we would like 9J l g to do the hest 
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it can to embody the structure of continuous fami- 
lies of genus-t? surfaces. Likevvise, there are spaces 
9Jlg,n parametrizing “n-pimctured” Riemann surfaces 
of genus g. This means we consider not “bare” Riemann 
surfaces, but Riemann surfaces together with a “deco- 
ration” or “marking” by n distinct labeled points (punc- 
tures). Two of these are considered to be equivalent if 
there is a complex-analytic isomorphism between them 
that takes punctures to punctures and preserves labels. 
Since there are Riemann surfaces with automorphisms, 
we do not expect (Mg to be able to classify all families 
of Riemann surfaces: that is, we will expect examples 
similar to the twisted square-lattice construction dis- 
cussed earlier. However, if we consider Riemann sur- 
faces with enough extra markings, then we will be able 
to obtain a moduli space in the strongest sense. One 
way to choose such markings is to consider M g , n with 
n large enough (for fixed g). Another approach will be 
to mark generators of the fundamental group, leading 
to the Teichmuller spaces ‘Tg and T g , n . We now outline 
this process. 

To construct the space S Ulg, we return to the uni- 
formization theorem. Any compact surface X of genus 
g > 1 has as its universal cover the upper half-plane 
H, so it is represented as a quotient X = H/r, where r 
is a representation of the fundamental group of X as a 
subgroup of conf ormal self-maps of H. The group of all 
conf ormal automorphisms of H is PSL2 (IR) , the group of 
linear fractional transformations with real coefficients. 
The fundamental groups of all compact genus-t? Rie- 
mann surfaces are isomorphic to a fixed abstract group 
r g , with 2 g generators Ai, Bi(i= 1 , . . . ,g) and one rela- 
tion: that the product of all commutators AtBiAT 
is the identity. A subgroup T c PSL 2 (R) that acts on H in 
such a way that the quotient H/r is a Riemann surface 
(technically, the action should have no fixed points and 
should be properly discontinuous) is known as a fuch- 
sian group [III.28]. Thus, the analogue of the represen- 
tation of elliptic curves by lattices L o. zæz in the plane 
is the representation of higher-genus Riemann surfaces 
as H/r, where r is a Fuchsian group. 

The Teichmuller space T g of genus-t; Riemann sur- 
faces is the space that solves the moduli problem 
for genus-t? surfaces when they come with a mark- 
ing of their fundamental group. This means that our 
objects are genus-t? surfaces X plus a set of generators 
Ai, Bi of tti(X), which give an isomorphism between 
7T1 (X) and r g , up to conjugation. 1 Our equivalences 


1. Note that while the fundamental group of X depends on the 
choice of a basepoint, tti(X,x) and ni(X,y) may be identified by 


are complex-analytic maps that preserve the markings. 
Finally, our continuous (respectively, complex-analytic) 
families are continuous (complex-analytic) families of 
Riemann surfaces with continuously varying markings 
of the fundamental group. In other words, we are 
asserting the existence of a topological space/complex 
manifold Tg, with a complex-analytic family of marked 
Riemann surfaces over it, and the following strong 
property. 

The characteristic property of T g . For any topologi- 
cal space (respectively, complex manifold) S, there is a 
bijection between continuous maps (respectively, holo- 
morphic maps) S — ■ T g and isomorphism classes of 
continuous (respectively, complex-analytic) famihes of 
marked genus-g surfaces parametrized by S. 

3.1 Digression: “Abstract Nonsense” 

It is interesting to note that, while we have yet to 
see why such a space exists, it follows from general, 
nongeometric principles— category theory [111.8] or 
“abstract nonsense” — that it is completely and uniquely 
determined, both as a topological space and as a com- 
plex manifold, by this characteristic property. In a very 
abstract way, every topological space M can be uniquely 
reconstructed from its set of points, the set of paths 
between these points, the set of surfaces spanning 
these paths, and so on. To put it differently, we can 
think of M as a “machine” that assigns to any topolog- 
ical space 5 the set of continuous maps from S to M. 
This machine is known as the “functor of points of Af.” 
Similarly, a complex manifold M provides a machine 
that assigns to any other complex manifold S the set of 
complex-analytic maps from S to M. A curious discov- 
ery of category theory (the Yoneda lemma) is that for 
very general reasons (having nothing to do with geom- 
etry), these machmes (or functors) uniquely determine 
M as a space, or a complex manifold. 

Any moduli problem in the sense we have described 
(giving objects, equivalences, and families) also gives 
such a machine, where to 5 we assign the set of all fam- 
ilies over S, up to isomorphism. So just by setting up the 
moduli problem we have already uniquely determined 
the topology and complex structure on Teichmuller 
space. The interesting part then is to know whether or 
not there actually exists a space giving rise to the same 


choosing a path from x to y, and the different choices are related by 
conjugation by a loop. Thus, if we are willing to identify sets of gener- 
ators At, Bi when they differ only by a conjugation, then we can ignore 
the choice of a basepoint. 
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machine we have constructed, whether we can con- 
struct it explicitly, and whether we can use its geometry 
to learn interesting facts about Riemann surfaces. 

3.2 Moduli Spaces and Representations 

Corning back to earth, we discover that we have a fairly 
concrete model of Teichmiiller space at our disposal. 
Once we have fixed the marking tti(X) ~ T g , we are 
simply looking at all ways to represent T g as a Fuch- 
sian subgroup of PSL2 (R). Ignoring the Fuchsian condi- 
tion for a moment, this means finding 2 g real matrices 
(up to ±Id) Ai,Bi g PSL2 (R) satisfying the commuta- 
tor relation of r g . This gives an explicit set of (alge- 
braic!) equations for the entries of the 2 g matrices, 
which determine the space of all representations r g — 
PSL2(R). We must now quotient out by the action of 
PSL2 (R) that simultaneously conjugates all 2 g matrices 
to obtain the representation variety Rep(r 5 , PSL2 (R) ). 
This is analogous to considering lattices in C up to rota- 
tion, and is motivated by the faet that the quotients 
of H by two conjugate subgroups of PSL2 (R) will be 
isomorphic. 

Once we have described the space of all representa- 
tions of r g into PSL2 (R), we can then single out Teich- 
miiller space as the subset of the representation vari- 
ety that consists of Fuchsian representations of r g into 
PSL2(R). Luckily this subset is open in the represen- 
tation variety, which gives a nice realization of T g as 
a topological space— in faet, T g is homeomorphic to 
R 6 ^ -6 (This can be seen very explicitly in terms of 
the Fenchel-Nielsen coordinates, which parametrize a 
surface in ‘Tg via a cut-and-paste procedure involving 
3g - 3 lengths and 3g - 3 angles.) We may now try to 
“forget” the marking rn (X) = r g , to obtain the mod- 
uli space S Vl g of unmarked Riemann surfaces. In other 
words, we would like to take T g and identify any two 
points that represent the same underlying Riemann 
surface with different markings. This Identification is 
achieved by the action of a group, the genus-g mapping 
class group MCG 5 or Teichmiiller modular group, on 
T g , which generalizes the modular group PSL2(Z) that 
acts on H = T l,i- (The mapping class group is defined 
as the group of all self-diffeomorphisms of a genus-g 
surface— remember that all such surfaces are topolog- 
ically the same— modulo those diffeomorphisms that 
act trivially on the fundamental group.) As in the case of 
elliptic curves, Riemann surfaces with automorphisms 
correspond to points in T g fixed by some subgroup of 
MCG a , and give rise to singular points in the quotient 
Wlg = Tg/MCGg. 


Representation varieties, or moduli spaces of repre- 
sentations, are an important and concrete class of mod- 
uli spaces that arise throughout geometry, topology, 
and number theory. Given any (discrete) group T, we 
ask (for example) for a space that parametrizes homo- 
morphisms of T into the group of n x n matrices. The 
notion of equivalence is given by conjugation by GL n , 
and that of families by continuous (or analytic, or alge- 
braic, etc.) families of matrices. This problem is inter- 
esting even when the group T is Z. Then we are sim- 
ply considering invertible n x n matrices (the image 
of 1 g z) up to conjugacy. It turns out that there is 
no moduli space for this problem, even in the coarse 
sense, unless we consider only “nice enough” matri- 
ces: for example, matrices that consist of only a single 
Jordan block. This is a good example of a ubiquitous 
phenomenon in moduli problems: one is often forced 
to throw out some “bad” (unstable) objects in order to 
have any chance of obtaining a moduli space. (See the 
paper by Mumford and Suominen (1972) for a detailed 
discussion.) 

3.3 Moduli Spaces and Jacobians 

The upper half -plane H = Ti,i, toge ther with the action 
of PSL2 (z), gives an appealingly complete picture of the 
moduli problem for elliptic curves and its geometry. 
The same cannot be said, unfortunately, for the pic- 
ture of T g as an open subset of the representation vari- 
ety. In particular, the representation variety does not 
even carry a natural complex structure, so we cannot 
see from this description the geometry of T g as a com- 
plex manifold. This failure reflects some of the ways 
in which the study of moduli spaces is more compli- 
cated for genus greater than 1. In particular, the mod- 
uli spaces of higher-genus surfaces are not described 
purely by linear algebra plus data about orientation, as 
is the case in genus 1. 

Part of the blame for this complexity lies with the faet 
that the fundamental group T g ~ tt-](X) (g > 1) is no 
longer Abelian, and in particular it is no longer equal to 
the first homology group Hi ( X , Z). A related problem is 
that X is no longer a group. A beautiful solution to this 
problem is given by the construction of the Jacobian 
Jac(JQ, which shares with elliptic curves the properties 
of being a torus (homeomorphic to (S 1 ) 20 ), an Abelian 
group, and a complex (in faet complex-algebraic) man- 
ifold. (The Jacobian of an elliptic curve is the elliptic 
curve itself.) The Jacobian captures the “Abelian” or 
“linear” aspects of the geometry of X. There is a mod- 
uli space JA g for such complex-algebraic tori (known as 
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Abelian varieties), which does share all of the nice prop- 
erties and linear algebraic description of the moduli of 
elliptic curves = JA] . The good news— the Torelli 

theorem— is that by assigning to each Riemann surface 
X its Jacobian we embed Wl g as a closed, complex- 
analytic subset of JA g . The interesting news— the Schot- 
tky problem— is that the image is quite complicated 
to characterize intrinsically. In faet, solutions to this 
problem have come from as far afield as the study of 
nonlinear partial differential equations! 

3.4 Further Directions 

In this section we give hints at some interesting ques- 
tions about, and applications of, moduli spaces. 

Deformations and degenerations. Two of the main top- 
ics in moduli spaces ask which objects are very near 
to a given one, and what Ues very far away. Deforma- 
tion theory is the calculus of moduli spaces: it describes 
their infinitesimal structure. In other words, given an 
object, deformation theory is concerned with describ- 
ing all its small perturbations (see Mazur (2004) for a 
beautiful discussion of this). At the other extreme, we 
can ask what happens when our objects degenerate? 
Most moduli spaces, for example the moduli of curves, 
are not compact, so there are famiUes “going off to 
infinity.” It is important to find “meaningful” compact- 
ifications of moduU spaces, which classify the possi- 
ble degenerations of our objects. Another advantage of 
compactifying moduU spaces is that we can then calcu- 
late integrals over the completed space. This is crucial 
for the next item. 

Invariants from moduli spaces. An important apphea- 
tion of moduli spaces in geometry and topology is 
inspired by quantum held theory, where a particle, 
rather than foUowing the “best” classical path between 
two points, follows all paths with varying probabiUties 
(see mirror symmetry [IV. 16 §2.2.4]). Classically, one 
calculates many topological invariants by picking a geo- 
metric structure (such as a metric) on a space, calculat- 
ing some quantity using this structure, and finally prov- 
ing that the result of the calculation did not depend on 
the structure we chose. The new alternative is to look 
at all such geometric structures, and integrate some 
quantity over the space of aU choices. The result, if 
we can show convergence, will manifestly not depend 
on any choices. String theory has given rise to many 
important appUcations of this idea, in particular by 
giving a rich structure to the collection of integrals 


obtained in this way. Donaldson and Seiberg-Witten 
theories use this philosophy to give topological invari- 
ants of four-manifolds. Gromov-Witten theory applies 
it to the topology of symplectic manifolds [III.90], 
and to counting problems in algebraic geometry, such 
as, How many rational plane curves of degree 5 pass 
through fourteen points in general position? (Answer: 
87 304.) 

Modular forms. One of the most profound ideas in 
mathematics, the Langlands program, relates number 
theory to funetion theory (harmonic analysis) on very 
special moduli spaces, generalizing the moduli space 
of elliptic curves. These moduli spaces (Shimura vari- 
eties) are expressible as quotients of symmetric spaces 
(such as H) by arithmetic groups (such as PSL2 (Z)). 
modular forms [III.61] and automorphic forms are 
special funetions on these moduli spaces, described 
by their interaction with the large symmetry groups 
of the spaces. This is an extremely exciting and active 
area of mathematics, which counts among its recent tri- 
umphs the proof of fermat’s last theorem [V.12] and 
the Shimura-Taniyama-Weil conjecture (Wiles, Taylor- 
Wiles, Breuil-Conrad-Diamond-Taylor). 

Further Reading 

For historical accounts and bibliographies on moduli 
spaces, the following articles are highly recommended. 

A beautiful and accessible overview of moduli spaces, 
with an emphasis on the notion of deformations, is 
given by Mazur (2004). The articles by Hain (2000) and 
Looijenga (2000) give excellent introductions to the 
study of the moduli spaces of curves, perhaps the old- 
est and most important of all moduU problems. The 
article by Mumford and Suominen (1972) introduces 
the key ideas underlying the study of moduli spaces 
in algebraic geometry. 
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aspects. In School on Algebraic Geometry, Trieste, 1999, 
pp. 293-353. ICTP Lecture Notes Series, no. 1. Trieste: The 
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IV.9 Representation Theory 

Ian Grojnowski 


1 Introduction 

It is a fundamental theme in mathematics that many 
ohjects, both mathematical and physical, have sym- 
metries. The goal of group [1.3 §2.1] theory in gen- 
eral, and representation theory in particular, is to study 
these symmetries. The difference between representa- 
tion theory and general group theory is that in repre- 
sentation theory one restricts one’s attention to sym- 
metries of vector spaces [1.3 §2.3]. I will attempt here 
to explain why this is sensible and how it influences our 
study of groups, causing us to focus on groups with 
certain nice structures involving conjugacy classes. 

2 Why Vector Spaces? 

The aim of representation theory is to understand how 
the internal structure of a group Controls the way it acts 
externally as a collection of symmetries. In the other 
direction, it also studies what one can leam about a 
group’s internal structure by regarding it as a group of 
symmetries. 

We begin our discussion by making more precise 
what we mean by “acts as a collection of symmetries.” 
The idea we are trying to capture is that if we are given 
a group G and an object X, then we can associate with 
each element g of G some symmetry of X, which we call 
<t>(g). For this to be sensible, we need the composition 
of symmetries to work properly: that is, cf>(g)cj)(h) (the 
result of applying </>(/ 1) and then (fig)) should be the 
same symmetry as </>(gh). If X is aset, then a symmetry 
of X is a particular kind of permutation [III. 70] of its 
elements. Let us denote by Aut(A) the group of all per- 
mutations of X. Then an action of G on X is defined to 
be a homomorphism from G to Aut(A). If we are given 
such a homomorphism, then we say that G acts on X. 

The image to have in mind is that G “does things” to 
X. This idea can often be expressed more conveniently 
and vividly by forgetting about <t/j in the notation: thus, 
instead of writing (fig) (x) for the effect on x of the 
symmetry associated with g, we simply think of g itself 



as a permutation and write gx. However, sometimes we 
do need to talk about </> as well: for distance, we might 
wish to compare two different actions of G on X. 

Here is an example. Take as our object X a square in 
the plane, centered at the origin, and let its vertices he 
A, B, C, and D (see figure 1). A square has eight symme- 
tries: four rotations by multiples of 90° and four reflec- 
tions. Let G be the group consisting of these eight sym- 
metries; this group is often called Dg, or the dihedral 
group of order 8. By definition, G acts on the square. 
But it also acts on the set of vertices of the square: 
for instance, the action of the reflection through the 
y-axis is to switch A with B and C with D. It might seem 
as though we have done very little here. After all, we 
defined G as a group of symmetries so it does not take 
much effort to associate a symmetry with each element 
of G. However, we did not define G as a group of permu- 
tations of the set {A,B, C,D}, so we have at least done 
something. 

To make this point clearer, let us look at some other 
sets on which G acts, which will include any set that 
we can build sufficiently naturally from the square. 
For instance, G acts not only on the set of vertices 
{A,B,C,D}, but on the set of edges {AB, BC, CD, DA} 
and on the set of cross-diagonals {AC,BD} as well. 
Notice in the latter case that some of the elements of 
G act in the same way: for example, a clockwise rota- 
tion through 90° interchanges the two diagonals, as 
does a counterclockwise rotation through 90°. If all the 
elements of G act differently, then the action is called 
faithful. 

Notice that the operations on the square (“reflect 
through the y-axis,” “rotate through 90°, ” and so on) 
can be applied to the whole Cartesian plane R 1 2 . There- 
fore, R 2 is another (and much larger) set on which G 
acts. To call R 2 a set, though, is to forget the very 
interesting faet that the elements in R 2 can be added 
together and multiplied by real numbers: in other 
words, R 2 is a vector space. Furthermore, the action 
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of G is well-behaved with respect to this extra struc- 
ture. For instance, if g is one of our symmetries and Vi 
and V2 are two elements of R 2 , then g applied to the 
sum vi + V2 yields the sum g(vi) + g(v2)- Because of 
this, we say that G acts linearly on the vector space R 2 . 
When V is a vector space, we denote by GL(V) the set 
of invertible linear maps from V to V. If V is the vec- 
tor space R n , this group is the familiar group GL ra (R) 
of invertible nxn matrices with real entries; similarly, 
when V = C n it is the group of invertible matrices with 
complex entries. 

Definition. A representation of a group G on a vector 
space V is a homomorphism from G to GL(V). 

In other words, a group action is a way of regarding 
a group as a collection of permutations, while a repre- 
sentation is the special case where these permutations 
are invertible linear maps. One sometimes sees repre- 
sentations referred to, for emphasis, as linear repre- 
sentations. In the representation of Dg on R 2 that we 
described above, the homomorphism from G to GL2 (R) 
took the symmetry “clockwise rotation through 90° ” to 
the matrix ( _? x J ) and the symmetry “refiection through 
the y-axis” to the matrix ( "^ ? ). 

Given one representation of G, we can produce oth- 
ers using natural constructions from linear algebra. For 
example, if p is the representation of G on R 2 described 
above, then det p (see determinants [III. 1 5]) is a homo- 
morphism from G to R* (the group of nonzero real 
numbers under multiplication), since 
det (p(gh)) = det (p(g)p(h)) = det (p(g)) det (p(h)), 
by the multiplicative property of determinants. This 
makes det p a one-dimensional representation, since 
each nonzero real number t can be thought of as the 
element “multiply by t” of GLi (R). If p is the represen- 
tation of Dg just discussed, then under det p we find 
that rotations act as the identity and reflections act as 
multiplication by - 1. 

The definition of “representation” is formally very 
similar to the definition of “action,” and indeed, since 
every linear automorphism of V is a permutation on 
the set of vectors in V, the representations of G on V 
form a subset of the actions of G on V. But the set of 
representations is in general a much more interesting 
object. We see here an instance of a general principle: 
if a set comes equipped with some extra structure (as 
a vector space comes with the ability to add elements 
together), then it is a mistake not to make use of that 
structure; and the more structure the better. 


In order to emphasize this point, and to place rep- 
resentations in a very favorable light, let us start by 
considering the general story of actions of groups on 
sets. Suppose, then, that G is a group that acts on a set 
X. For each x, the set of all elements of the form gx, as 
g ranges over G, is called the orbit of x. It is not hard 
to show that the orbits form a partition of X. 

Example. Let G be the dihedral group D$ acting on 
the set X of ordered pairs of vertices of the square, of 
which there are sixteen. Then there are three orbits of 
G on X, namely {AA, BB,CC,DD}, {AB,BA,BC,CB,CD, 
DC, DA, AD}, and {AC, CA, BD,DB}. 

An action of G on X is called transitive if there is just 
one orbit. In other words, it is transitive if for every 
x and y in X you can find an element g such that 
gx = y. When an action is not transitive, we can con- 
sider the action of G on each orbit separately, which 
effectively breaks up the action into a collection of 
transitive actions on disjoint sets. So in order to study 
all actions of G on sets it suffices to study transitive 
actions; you can think of actions as “molecules” and 
transitive actions as the “atoms” into which they can 
be decomposed. We shall see that this idea of decom- 
posing into objects that cannot be further decomposed 
is fundamental to representation theory. 

What are the possible transitive actions? A rich 
source of such actions comes from subgroups H of G. 
Given a subgroup H of G, a left coset of H is a set of 
the form {gh :he H}, which is commonly denoted by 
gH. An elementary result in group theory is that the 
left cosets form a partition of G (as do the right cosets, 
if you prefer them). There is an obvious action of G on 
the set of left cosets of H, which we denote by G/H: if 
g' is an element of G, then it sends the coset gH to the 
coset (g’g)H. 

It turns out that every transitive action is of this form! 
Given a transitive action of G on a set X, choose some 
x e X and let H x be the subgroup of G consisting of 
all elements h such that hx = x. (This set is called the 
stabilizer of x.) Then one can check that the action of G 
on X is the same 1 as that of G on the left cosets of H x . 
For example, the action of Dg on the first orbit above is 
isomorphic to the action on the left cosets of the two- 
element subgroup H generated by a refiection of the 
square through its diagonal. If we had made a different 


1. By “the same” we mean “isomorphic as sets with G-action.” The 
casual reader may read this as “the same,” while the more careful 
reader should stop here and work out, or look up, precisely what is 
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choice of x, for example the point x' = gx, then the 
subgroup of G fbdng x' would just be gH x g~ l . This is 
a so-called conjugate subgroup, and it gives a different 
description of the same orbit, this time as left cosets of 
gH x g~K 

It follows that there is a one-to-one correspondence 
between transitive actions of G and conjugacy classes 
of subgroups (that is, collections of subgroups conju- 
gate to some given subgroup). If G acts on our original 
set X in a nontransitive way, then we can break X up 
into a union of orbits, each of which, as a result of this 
correspondence, is associated with a conjugacy class of 
subgroups. This gives us a convenient “bookkeeping” 
mechanism for describing the action of G on X: just 
keep track of how many times each conjugacy class of 
subgroups arises. 

Exercise. Check that in the example earlier the three 
orbits correspond (respectively) to a two-element sub- 
group R generated by reflection through a diagonal, the 
trivial subgroup, and another copy of the group R. 

This completely solves the problem of how groups 
act on sets. The internal structure that Controls the 
action is the subgroup structure of G. 

In a moment we will see the corresponding solution 
to the problem of how groups act on vector spaces. 
First, let us just stare at sets for a while and see why, 
though we have answered our question, we should not 
feel too happy about it. 2 

The problem is that the subgroup structure of a 
group is just horrible. 

For example, any finite group of order n is a subgroup 
of the symmetric group [III.70] S n (this is “Cayley’s 
theorem,” which follows by considering the action of 
G on itself), so in order to list the conjugacy classes of 
subgroups of the symmetric group S n one must under- 
stand all finite groups of size less than n. 3 Or consider 
the cyclic group Z/nZ. The subgroups correspond to 
the divisors of n, a subtle property of n that makes 
the cyclic groups behave quite differently as n varies. 
If n is prime, then there are very few subgroups, while 
if n is a power of 2 there are quite a few. So number 
theory is involved even if all we want to do is under- 
stand the subgroup structure of a group as simple as a 
cyclic group. 


2. Exercise: go back to the example of Dg and list all the possible 
transitive actions. 

3. THE CLASSIFICATION OF FINITE SIMPLE GROUPS [V.8] does at leaSt 
allow us to estimate the number y n of subgroups of S n up to conju- 
gacy: it is a result of Pyber that 2 ((1 ' l i6)+o(i))n 2 ^ y n ^ 24 ((1/ 6)+o(D)n 2 . 
Equality is expected for the lower bound. 


With some relief we now turn our attention back 
to linear representations. We will see that, just as 
with actions on sets, one can decompose represen- 
tations into “atomic” ones. But, by contrast with the 
case of sets, these atomic representations (called “irre- 
ducibles”) turn out to exhibit quite beautiful regulari- 
ties. 

The nice properties of representation theory come 
largely from the following faet. While elements of the 
symmetric group S n can be multiplied together, ele- 
ments of GL(V), being matrices, can be added as well 
as multiplied. (But beware: the sum of two elements of 
GL(V) is not necessarily an element of GL(V), because 
it may not be invertible. It is, however, an element of the 
endomorphism algebra End(V). When V = C n , End(V) 
is just the familiar algebra of all n x n matrices with 
complex entries, both invertible and not.) 

To see the difference it makes to be able to add, con- 
sider the cyclic group G = Z/nZ. For each w e C with 
cu™ = 1, we get a representation Xm of G on C by asso- 
ciating the element r e ZlnZ with multiplication by 
c v r , which we think of as a linear map from the one- 
dimensional space € to itself. This gives us n differ- 
ent one-dimensional representations, one for each nth 
root of unity, and it turns out that there are no others. 
Moreover, if p : G — GL(V) is any representation of 
Z/nZ, then we can write it as a direct sum of these rep- 
resentations by imitating the formula for finding the 
Fourier mode of a funetion. Using the representation 
p, we associate with each r in Z/nZ a linear map p(r). 
Now let us define a linear map p w : V — V by the 
formula 

Pw = ~ X w~ r P(r). 

n 0 <r<n 

Then p w is an element of End(V), and one can check 
that it is actually a projection [III. 5 2 §3.5] onto a sub- 
space V w of V. In faet, this subspace is an eigenspace 
[1.3 §4.3]: it consists of all vectors v such that p(l)v = 
æv, which implies, since p is a representation, that 
p(r)v = u> r v. The projection p w should be thought of 
as the analogue of the nth fourier coefficient [III.27] 
a n (f ) of a funetion f(0) on the circle; note the formal 
similarity of the above formula to the Fourier expansion 
formula a n (f) = J e“ 27Tinø /(d) dØ. 

Now the interesting thing about the Fourier series of 
/ is that, under favorable circumstances, it adds up to 
/ itself: that is, it decomposes / into trigonometric 
functions [III.94]. Similarly, what is interesting about 
the subspaces V æ is that we can use them to decom- 
pose the representation p. The composition of any two 
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distinet projections p w is 0, from which it can be shown 
that 

V=©W 

We can write each subspace V w as a stim of one- 
dimensional spaces, which are copies of C, and the 
restriction of p to any one of these is just the sim- 
ple representation Xw defined earlier . Thus, p has been 
decomposed as a combination of very simple “atoms” 
Xw- 4 

This ability to add matrices has a very useful conse- 
quence. Let a finite group G act on a complex vector 
space V. A subspace W of V is called G-invariant if 
gW = W for every g g G. Let W be a G-invariant sub- 
space, and let U be a complementary subspace (that is, 
one such that every element v of V can be written in 
exaetly one way as w + u with w e W and u e U). Let 
ø be an arbitrary projection onto U. Then it is a simple 
exercise to show that the linear map 1/|G| X<j<ec d4> is 
also a projection onto a complementary subspace, but 
with the added advantage that it is G-invariant. This lat- 
ter faet follows because applying an element g' to the 
sum just rearranges its terms. 

The reason this is so useful is that it allows us to 
decompose an arbitrary representation into a direct 
sum of irreducible representations, which are represen- 
tations without a G-invariant subspace. Indeed, if p is 
not irreducible, then there is a G-invariant subspace 
W. By the above remark, we can write G = W ® W' 
with W' also G-invariant. If either W or W has a fur- 
ther G-invariant subspace, then we can decompose it 
further, and so on. We have just seen this done for 
the cyclic group: in that case the irreducible repre- 
sentations were the one-dimensional representations 
Xw- 

The irreducible representations are the basic budd- 
ing blocks of arbitrary complex representations, just 
as the basic budding blocks for actions on sets are the 
transitive actions. It raises the question of what the irre- 
ducible representations are, a question that has been 
answered for many important examples, but which is 
not yet solvable by any general procedure. 

To return to the difference between actions and rep- 
resentations, another important observation is that any 
action of a group G on a finite set X can be linearized 
in the fodowing sense. If X has n elements, then we can 


4. To summarize the rest of this article: the similarity to the Fourier 
transform Is not just analogy— decomposing a representation into its 
irreducible summands is a notion that includes both this example and 
the Fourier transform. 


look at the hilbert space [III.37] l 2 (X) of all complex- 
valued funetions defined on X. This has a natural basis 
given by the “delta funetions” S x , which send x to 1 and 
ad other elements of X to 0. Now we can turn the action 
of G on X into an action of G on the basis in an obvious 
way: we just define gå x to be 8 gx . We can extend this 
definition by linearity, since an arbitrary funetion / is a 
linear combination of the basis funetions 5 X . This gives 
us an action of G on L 2 (X), which can be defined by a 
simple formula: if / is a funetion in L 2 {X), then gf is 
the funetion defined by ( gf)(x ) = f(g~ l x). Equiva- 
lently, gf does to gx what / does to x. Thus, an action 
on sets can be thought of as an assignment of a very 
special matrix to every group element, namely a matrix 
with only Os and ls and precisely one 1 in each row 
and each column. (Such matrices are caded permutation 
matrices.) By contrast, a general representation assigns 
an arbitrary invertible matrix. 

Now, even when X itself is a single orbit under the 
action of G, the above representation on L 2 (X) can 
break up into pieces. For an extreme example of this 
phenomenon, consider the action of Z/nZ on itself by 
multiplication. We have just seen that, by means of the 
“Fourier expansion” above, this breaks up into a sum 
of n one-dimensional representations. 

Let us now consider the action of an arbitrary group 
G on itself by multiplication, or, to be more precise, left 
multipdeation. That is, we shall associate with each ele- 
ment g the permutation of G that takes each h in G to 
gh. This action is obviously transitive. As an action on 
a set it cannot be decomposed any further. But when 
we linearize this action to a representation of G on the 
vector space L 2 (G), we have mueh greater dexibility to 
decompose the action. It turns out that, not only does 
it break up into a direct sum of many irreducible rep- 
resentations, but every irreducible representation p of 
G occurs as one of the summands in this direct sum, 
and the number of times that p appears is equal to the 
dimension of the subspace on which it acts. 

The representation we have just discussed is called 
the left regular representation of G. The faet that 
every irreducible representation occurs in it so regu- 
larly makes it extremely useful. Notice that it is easier to 
decompose representations on complex vector spaces 
than on real vector spaces, since every automorphism 
of a complex vector space has an eigenvector. So it is 
simplest to begin by studying complex representations. 

The time has now come to State the fundamental the- 
orem about complex representations of finite groups. 



110 


IV. Branches of Mathematics 


This theorem tells us how many irreducible representa- 
tions there are for a finite group, and, more colorfully, 
that representation theory is a “non-Abelian analogue 
of Fourier decomposition.” 

Let p : G — ■ End(V) be a representation of G. The 
character Xp of p is defined to be its trace: that is, Xp is 
a function from G to C and Xp(ø) = tr (p (g)) for each g 
in G. Since tr(AB) = tr (BA) for any two matrices A and 
B, we have Xpihgh -1 ) = Xp(ø)- Therefore, xv is very 
far from an arbitrary function on G: it is a function that 
is constant on each conjugacy class. Let K c denote the 
vector space of all complex-valued functions on G with 
this property; it is called the representation ring of G. 

The characters of the irreducible representations of 
a group form a very important set of data about the 
group, which it is natur al to organize into a matrix. The 
columns are indexed by the conjugacy classes, the rows 
by the irreducible representations, and each entry is the 
value of the character of the given representation at the 
given conjugacy class. This array is called the character 
table of the group, and it contains all the important 
information about representations of the group: it is 
our periodic table. The basic theorem of the subject is 
that this array is a square. 

Theorem (the character table is square). Let G be 

a finite group. Then the characters of the irreducible 
representations form an orthonormal basis of Kc . 

When we say that the basis of characters is orthonor- 
mal we mean that the Hermitian inner product defined 
by 

<*,</'> = I G | — 1 X X(0)V(0) 

øeG 

is 1 when x = V and 0 otherwise. The faet that it 
is a basis implies in particular that there are exaetly 
as many irreducible representations as there are con- 
jugacy classes in G, and the map from isomorphism 
classes of representations to Kc that sends each p to 
its character is an injection. That is, an arbitrary rep- 
resentation is determined up to isomorphism by its 
character. 

The internal structure of a group G that Controls how 
it can act on vector spaces is the structure of conju- 
gacy classes of elements of G. This is a mueh gentier 
structure than the set of all conjugacy classes of sub- 
groups of G. For example, in the symmetric group S n 
two permutations belong to the same conjugacy class 
if and only if they have the same cycle type. Therefore, 


in that group there is a bijection between conjugacy 
classes and partitions of n. 5 

Furthermore, whereas it is completely unclear how to 
count subgroups, conjugacy classes are mueh easier to 
handle. For instance, since they partition the group, we 
havetheformula|G| = ZcaconjugacyciassICI- On the rep- 
resentation side, there is a similar formula, which arises 
from the decomposition of the regular representation 
I 2 (G) into irreducibles: |G| = Zv irreducible (dimV) 2 . It 
is inconceivable that there might be a similarly simple 
formula for sums over all subgroups of a group. 

We have reduced the problem of understanding the 
general structure of the representations of a finite 
group G to the problem of determining the character 
table of G. When G = Z/nZ, our description of the n 
irreducible representations above implies that all the 
entries of this matrix are roots of unity. Here are the 
character tables for Dg (on the left), the group of sym- 
metries of the square, and, just for contrast, for the 
group Z/3Z (on the right): 

11111 111 

111-1-1 1 z z 2 

11-11-1 I z 2 z 

1 1 - 1-1 1 
2 -2 0 0 0 
where z = exp(2m/3). 

The obvious question— Where did the first table come 
from?— indicates the main problem with the theorem: 
though it tells us the shape of the character table, it 
leaves us no doser to understanding what the actual 
character values are. We know how many representa- 
tions there are, but not what they are, or even what 
their dimensions are. We do not have a general method 
for constructing them, a kind of “non-Abelian Fourier 
transform.” This is the central problem of representa- 
tion theory. 

Let us see how this problem can be solved for the 
group Dg. Over the course of this article, we have 
already encountered three irreducible representations 
of this group. The first is the “trivial” one-dimensional 
representation: the homomorphism p : Dg -> GLi that 
takes every element of Dg to the identity. The second is 
the two-dimensional representation we wrote down in 
the first section, where each element of Ds acts on R 2 


5. Not only is the set of all partitions a sensible combinatorial 
object, it is far smaller than the set of all subgroups of S n '. hardy 
[VI.73] and ramanujan [VI.82] showed that the number of partitions 
of n is about (l/4nV3)e"V< 2 '*fl>. 
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in the obvious way. The determinant of this represen- 
tation is a one-dimensional representation that is not 
trivial: it sends the rotations to 1 and the reflections to 
- 1 . So we have constructed the first three rows of the 
character table above. There are five conjugacy classes 
in D$ (trivial, reflection through axis, reflection through 
diagonal, 90° rotation, 180° rotation), so we know that 
there are just two more rows. 

The equality |G| = 8 = 2 2 + 1 + 1 + (dimV 4 ) 2 + 
(dimVs) 2 implies that these missing representations 
are one dimensional. One way of getting the missing 
character values is to use orthogonality of characters. 

A slightly (but only slightly) less ad hoc way is to 
decompose L 2 ( X ) for small X. For example when X is 
the pair of diagonals { AC, BD} , we have L 2 (X) = V 4 ® C, 
where C is the trivial representation. 

We are now going to start pointing the way toward 
some more modern topics in representation theory. Of 
necessity, we will use language from fairly advanced 
mathematics: the reader who is familiar with only some 
of this language should consider browsing the remain- 
ing sections, since different discussions have different 
prerequisites. 

In general, a good, but not systematic, way of Ånd- 
ing representations is to And objects on which G acts, 
and “Unearize” the action. We have seen one exam- 
ple of this: when G acts on a set X we can consider 
the linearized action on L 2 (X). Recall that the irre- 
ducible G-sets are all of the form G/H, for H some sub- 
group of G. As well as looking at L 2 {G/H), we can con- 
sider, for every representation W of H, the vector space 
L 2 (G/H,W) = {f:G~W I figh) = h~ l f(g), g S 
G, h g ff}; in geometric language, for those who pre- 
fer it, this is the space of sections of the associated 
W-bundle on G/H. This representation of G is called 
the induced representation of W from H to G. 

Other linearizations are also important. For example, 
if G acts continuously on a topological space X, we can 
consider how it acts on homology classes and hence 
on the homology groups [IV. 6 §4] of X. 6 The simplest 
case of this is the map z — ■ z of the circle S 1 . Since 
this map squares to the identity map, it gives us an 
action of Z/2Z on S 1 , which becomes a representation 
of Z/2Z on Hi (S 1 ) = R (which represents the identity 
as multiplication by 1 and the other element of Z/2Z as 
multiplication by -1). 


6. The homology groups discussed in the article just referred to 
consist of formal sums of homology classes with integer coefficients. 
Here, where a vector space Is required, we are taking real coefficients. 


Methods like these have been used to determine the 
character tables of all Anite simple groups [1.3 §3.3], 
but they still fall short of a uniform description valid 
for all groups. 

There are many arithmetic properties of the charac- 
ter table that hint at properties of the desired non- 
Abelian Fourier transform. For example, the size of a 
conjugacy class divides the order of the group, and 
in faet the dimension of a representation also divides 
the order of the group. Pursuing this thought leads to 
an examination of the values of the characters mod p, 
relating them to the so-called p-local subgroups. These 
are groups of the formiV(Q)/Q, where Qis a subgroup 
of G, the number of elements of Q is a power of p, and 
N{Q) is the normalizer of Q (deAned to be the largest 
subgroup of G that contains Q as a normal subgroup). 
When the so-called “p-Sylow subgroup” of G is Abe- 
lian, beautiful conjectures of Broué give us an essen- 
tially complete picture of the representations of G. But 
in general these questions are at the center of a great 
deal of contemporary research. 

3 Fourier Analysis 

We have justiAed the study of group actions on vector 
spaces by explaining that the theory of representations 
has a nice structure that is not present in the theory 
of group actions on sets. A more historically based 
account would start by saying that spaces of funetions 
very often come with natural actions of some group 
G, and many problems of traditional interest can be 
related to the decomposition of these representations 
of G. 

In this section we will concentrate on the case where 
G is a compact lie group [III. 50 §1]. We will see that in 
this case many of the nice features of the representa- 
tion theory of Anite groups persist. 

The prototypical example is the space I 2 (S 1 ) of 
square-integrable funetions on the circle S 1 . We can 
think of the circle as the unit circle in C, and thereby 
identify it with the group of rotations of the circle 
(since multiplication by e'° rotates the circle by 9). This 
action linearizes to an action on L 2 (S 1 ) : if / is a square- 
integrable funetion deAned on S 1 and w belongs to the 
circle, then (w ■ f)(z) is deAned to be f(w~ l z). That 
is, w ■ f does to wz what / does to z. 

Classical Fourier analysis expands funetions in 
I 2 (S 1 ) in terms of a basis of trigonometric funetions: 
the funetions z™ for n g Z. (These look more “trigono- 
metric” if one writes e'° for z and e mø for z n .) If we 
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fix w and write </> ra (z) = z n , then (le ■ ø„)(z) = 
4> n (w~ l z) = tp _fl </> ra (z)- In particular, w -<t> n is a mul- 
tiple of 4>n for each w , so the one-dimensional sub- 
space generated by 4>n is invariant under the action of 
S 1 . In faet, every irreducible representation of S 1 is of 
this form, as long as we restrict attention to continuous 
representations. 

Now let us consider an innocuous-looking general- 
ization of the above situation: we shall replace 1 by n 
and try to understand L 2 (S n ), the space of complex- 
valued square-integrable funetions on the n-sphere S n . 
The n-sphere is acted on by the group of rotations 
SO(n + l). As usual, this can be converted into a rep- 
resentation of SO(n + l) on the space L 2 (S n ), which 
we would like to decompose into irreducible repre- 
sentations; equivalently, we would like to decompose 
L 2 (S n ) into a direct sum of minimal SO(n + l)-invariant 
subspaces. 

This turns out to be possible, and the proof is very 
similar to the proof for finite groups. In particular, a 
compact group such as SO(n + 1) has a natural proba- 
bility me asure [III.73 §2] on it (called Haar measure ) 
in terms of which we can define averages. Roughly 
speaking, the only difference between the proof for 
SO(n + l) and the proof in the finite case is that we 
have to replace a few sums by integrals. 

The general result that one can prove by this method 
is the following. If G is a compact group that acts con- 
tinuously on a compact space X (in the sense that each 
permutation <fi(g) of X is continuous, and also that 
f (g) varies continuously with g), then L 2 (X) splits 
up into an orthogonal direct sum of finite-dimensional 
minimal G-invariant subspaces; equivalently, the lin- 
earized action of G on L 2 {X) splits up into an orthog- 
onal direct sum of irreducible representations, all of 
which are finite dimensional. The problem of finding a 
Hilbert space basis of L 2 {X) then splits into two sub- 
problems: we must first determine the irreducible rep- 
resentations of G, a problem which is independent of 
X, and then determine how many times each of these 
irreducible representations occurs in I 2 ( X ). 

When G = S 1 (which we identified with SO (2)) and 
X = S 1 as well, we saw that these irreducible repre- 
sentations were one dimensional. Now let us look at 
the action of the compact group SO(3) on S 2 . It can be 
shown that the action of G on I 2 (S 2 ) commutes with 
the Laplacian, the differential operator A on I 2 (S 2 ) 
defined by 

, 3 2 _3^ 

3x 2 ' dy 2 ' 3 z 2 ' 


That is, g(Af) = A (gf) for any g g G and any 
(sufficiently smooth) funetion /. In particular, if / is 
an eigenfunetion for the Laplacian (which means that 
Af = Af for some A e C), then for each g e SO(3) we 
have 

A gf = g Af = g Af = A gf, 

so gf is also an eigenfunetion for A. Therefore, the 
space V\ of all eigenvectors for the Laplacian with 
eigenvalue A is G-invariant. In faet, it turns out that 
if V\ is nonzero then the action of G on V\ is an irre- 
ducible representation. Furthermore, each irreducible 
representation of SO(3) arises exaetly once in this way. 
More precisely, we have a Hilbert space direct sum, 

L 2 {S 2 ) = ® V 2 n(2n + 2), 
n^O 

and each eigenspace Vzn&n+i) has dimension 2n + 1. 
Note that this is a case where the set of eigenvalues 
is discrete. (These eigenspaces are discussed further in 
SPHERICAL HARMONICS [III.89].) 

The nice feature that each irreducible representation 
appears at most once is rather special to the exam- 
ple I 2 (S"). (For an example where this does not hap- 
pen, recall that with the regular representation I 2 (G) 
of a finite group G each irreducible representation p 
occurs dimp times in L 2 (G).) However, other features 
are more generic: for example, when a compact Lie 
group acts differentiably on a space X, then the sum of 
all the G-invariant subspaces of L 2 (X) corresponding 
to a particular representation is always equal to the set 
of common eigenvectors of some family of commuting 
differential operators. (In the example above, there was 
just one operator, the Laplacian.) 

Interesting special functions [III. 8 7], such as solu- 
tions of certain differential equations, often admit rep- 
resentation-theoretic meaning, for example as matrix 
coefficients. Their properties can then easily be de- 
duced from general results in funetional analysis and 
representation theory rather than from any calculation. 
Hypergeometric equations, Bessel equations, and many 
integrable systems arise in this way. 

There is more to say about the similarities between 
the representation theory of compact groups and that 
of finite groups. Given a compact group G and an 
irreducible representation p of G, we can again take 
its trace (since it is finite dimensional) and thereby 
define its character Xp ■ Just as before, Xp is constant 
on each conjugacy class. Finally, “the character table 
is square,” in the sense that the characters of the irre- 
ducible representations form an orthonormal basis of 
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the Hilbert space of all square-integrable functions that 
are conjugation invariant in this sense. (Now, though, 
the “square matrix” is infinite.) When G = S 1 this is the 
Fourier theorem; when G is finite this is the theorem of 
section 2. 

4 Noncompact Groups, Groups in 

Characteristic p, and Lie Algebras 

The “character table is square” theorem focuses our 
attention on groups with nice conjugacy-class struc- 
ture. What happens when we take such a group but 
relax the requirement that it be compact? 

A paradigmatic noncompact group is the real num- 
bers R. Like S 1 , R acts on itself in an obvious way 
(the real number t is associated with the translation 
s -*■ 5 + 1), so let us linearize that action in the usual way 
and look for a decomposition of L 2 (R) into R-invariant 
subspaces. 

In this situation we have a continuous family of irre- 
ducible one-dimensional representations: for each real 
number A we can define the function x\ by x\M = 
e 2mA% jhese functions are not square integrable, but 
despite this difficulty classical Fourier analysis tells us 
that we can write an 1 2 -function in terms of them. 
However, since the Fourier modes now vary in a con- 
tinuous family, we can no longer decompose a func- 
tion as a sum: rather we must use an integral. First, 
we define the Fourier transform / of / by the formula 
/(A) = S /(x)e 2Tr1ÅX dx. The desired decomposition of 
/ is then /(x) = f f(A)e~ 2jTiXx dA. This, the Fourier 
inversion formula, tells us that / is a weighted integral 
of the functions x\ ■ We can also think of it as some- 
thing like a decomposition of L 2 (R) as a “direct inte- 
gral” (rather than direct sum) of the one-dimensional 
subspaces generated by the functions x\ ■ However, 
we must treat this picture with due caution since the 
functions x\ do not belong to L 2 (R). 

This example indicates what we should expect in gen- 
eral. If X is a space with a measure and G acts continu- 
ously on it in a way that preserves the measures of sub- 
sets of X (as translations did with subsets of R), then 
the action of G on X gives rise to a measure px defined 
on the set of all irreducible representations, and L 2 ( X ) 
can be decomposed as the integral over all irreducible 
representations with respect to this measure. A theo- 
rem that explicitly describes such a decomposition is 
called a Plancherel theorem for X. 

For a more complicated but more typical example, 
let us look at the action of SL2 (R) (the group of real 


2x2 matrices with determinant 1) on R 2 and see how 
to decompose I 2 (R 2 ). As we did when we looked at 
functions defined on S 2 , we shall make use of a differ- 
ential operator. This involves the small technicality that 
we should look at smooth functions, and we do not ask 
for them to be defined at the origin. The appropriate 
differential operator this time turns out to be the Euler 
vector held x(9/3x) + y(d/dy). It is not hard to check 
that if / satisfies the condition f(tx, ty) = t s f{x,y) 
for every x, y, and t > 0, then / is an eigenfunction 
of this operator with eigenvalue s, and indeed all func- 
tions in the eigenspace with this eigenvalue, which we 
shall denote by W s , are of this form. We can also split 
W s up as Ws ® Wf, where W s + and Wf consist of the 
even and odd functions in W s , respectively. 

The easiest way of analyzing the structure of W s is to 
compute the action of the lie algebra [III. 50 §2] 5(2. 
For those readers unfamiliar with Lie algebras, we will 
say only that the Lie algebra of a Lie group G keeps 
track of the action of elements of G that are “infmites- 
imally close to the identity,” and that in this case the 
Lie algebra 5(2 can be identified with the space of 2 x 2 
matrices of trace 0, with ( “ ) acting as the differential 

operator (-ax - by)(d/dx) + (-cx + a,y)(d/dy). 

Every element of W s is a function on R 2 . If we restrict 
these functions to the unit circle, then we obtain a map 
from W s to the space of smooth functions defined on 
S 1 , which turns out to be an isomorphism. We already 
know that this space has a basis of Fourier modes z m , 
which we can now think of as (x + i y) m , defined when 
x 2 + y 2 = 1. There is a unique extension of this from a 
function defined on S 1 to a function in W s , namely the 
function w m (x,y) = (x + ry) m (x 2 + y 2 )G-m)i2_ øne 
can then check the following actions of simple matri- 
ces on these functions (to do so, recall the association 
of the matrices with differential operators given in the 
previous paragraph): 



It follows that if s is not an integer, then from any func- 
tion w m in W^ we can produce all the others using 
the action of SL2(R). Therefore, SL2 (R) acts irreducibly 
on W^ ■ Similarly, it acts irreducibly on Wf. We have 
therefore encountered a significant difference between 
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this and the finite/compact case: when G is not com- 
pact, irreducible representations of G can be infinite 
dimensional. 

Looking more closely at the formulas for W s when 
s e z, we see more disturbing differences. In order to 
understand these, let us distinguish carefully between 
representations that are reducible and representations 
that are decomposable. The former are representations 
that have nontrivial G-invariant subspaces, whereas 
the latter are representations where one can decom- 
pose the space on which G acts into a direct sum of 
G-invariant subspaces. Decomposable representations 
are obviously reducible. In the finite/compact case, we 
used an averaging process to show that reducible rep- 
resentations are decomposable. Now we do not have 
a natural probability measure to use for the aver- 
aging, and it turns out that there can be reducible 
representations that are not decomposable. 

Indeed, if 5 is a nonnegative integer, then the sub- 
spaces Ws and Wf give us an example of this phe- 
nomenon. They are indecomposable (in faet, this is true 
even when 5 is a negative integer not equal to -1) but 
they contain an invariant subspace of dimension s + 1. 
Thus, we cannot write the representation as a direct 
sum of irreducible representations. (One can do some- 
thing a little bit weaker, however: if we quotient out 
by the (s + 1) -dimensional subspace, then the quotient 
representation can be decomposed.) 

It is important to understand that in order to produce 
these indecomposable but reducible representations 
we worked not in the space I 2 (IR 2 ) but in the space of 
smooth funetions on R 2 with the origin removed. For 
instance, the funetions w m above are not square inte- 
grable. If we look just at representations of G that act 
on subspaces of L 2 ( X ), then we can split them up into 
a direct sum of irreducibles: given a G-invariant sub- 
space, its orthogonal complement is also G-invariant. 
It might therefore seem hest to ignore the other, rather 
subtle representations and just look at these ones. But 
it turns out to be easier to study all representations 
and only later ask which ones occur inside L 2 (X). For 
SL2(R), the representations we have just constructed 
(which were subquotients of Wf) exhaust all the irre- 
ducible representations, 7 and there is a Plancherel for- 
mula for I 2 (R 2 ) that tells us which ones appear in 


7. To make this precise requires some care about what we mean 
by “isomorphic.” Because many different topological vector spaces 

can have the same underlying S^-module, the correct notion is of 

infinitesimal equivalence. Pursuing this notion leads to the category of 
Harish-Chandra modules, a category with good finiteness properties. 


L 2 (R 2 ) and with what multiplicity: 

L 2 (R 2 ) = J VV. l _ It e i 'd(. 

To summarize: if G is not compact, then we can no 
longer take averages over G. This has various conse- 
quences: 

Representations occur in continuous families. The 

decomposition of L 2 (X) takes the form of a direct 
integral, not a direct sum. 

Representations do not split up into a direct sum of 
irreducibles. Even when a representation admits a 
finite composition series, as with the action of SL 2 (R) 
on Wf, it need not split up into a direct sum. So 
to describe all representations we need to do more 
than just describe the irreducibles— we also need to 
describe the glue that holds them together. 

So far, the theory of representations of a noncom- 
pact group G seems to have none of the pleasant fea- 
tures of the compact case. But one thing does survive: 
there is still an analogue of the theorem that the char- 
acter table is square. Indeed, we can still define charac- 
ters in terms of the traces of group elements. But now 
we must be careful, since the irreducible representa- 
tion may be on an infinite-dimensional vector space, so 
that its trace cannot be defined so easily. In faet, char- 
acters are not funetions on G, but only distributions 
[III. 18]. The character of a representation determines 
the semisimplification of a representation p: that is, it 
tells us which irreducible representations are part of p, 
but not how they are glued together. 8 

These phenomena were discovered by Harish-Chan- 
dra in the 1950s in an extraordinary series of works that 
completely described the representation theory of Lie 
groups such as the ones we have discussed (the precise 
condition is that they should be real and reductive— 
a concept that will be explained later in this article) 
and the generalizations of classical theorems of Fourier 
analysis to this setting. 9 

Independently and slightly earlier, Brauer had inves- 
tigated the representation theory of finite groups on 
finite-dimensional vector spaces over helds of char- 
acteristic p. Here, too, reducible representations need 
not decompose as direct sums, though in this case the 


8. It is a major theorem of Harish-Chandra that the distribution that 
defines a character is given by analytic funetions on a dense subset of 
the semisimple elements of the group. 

9. The problem of determining the irreducible unitary represen- 
tations for real reductive groups has still not been solved; the most 
complete results are due to Vogan. 
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problem is not lack of compactness (obviously, since 
everything is finite) but an inability to average over the 
group: we would like to divide by |G|, but often this 
is zero. A simple example that illustrates this is the 
action of z/pz on the space W 2 that takes x to the 2 x 2 
matrix ( } £ ). This is reducible, since the column vec- 
tor ( q ) is fixed by the action, and therefore generates 
an invariant subspace. However, if one could decom- 
pose the action, then the matrices (\q) would all be 
diagonalizable, which they are not. 

It is possible for there to be infmitely many indecom- 
posable representations, which again may vary in fam- 
ilies. However, as before, there are only fmitely many 
irreducible representations, so there is some chance of 
a “character table is square” theorem in which the rows 
of the square are parametrized by characters of irre- 
ducible representations. Brauer proved just such a the- 
orem, pairing the characters with p-semisimple conju- 
gacy classes in G: that is, conjugacy classes of elements 
whose order is not divisible by p. 

We will draw two crude morals from the work of 
Harish-Chandra and of Brauer. The first is that the cat- 
egory of representations of a group is always a reason- 
able object, but when the representations are infmite 
dimensional it requires serious technical work to set it 
up. Objects in this category do not necessarily decom- 
pose as a direct sum of irreducibles (one says that the 
category is not semisimplé), and can occur in infmite 
families, but irreducible objects pair off in some precise 
way with certain “diagonalizable” conjugacy classes in 
the group — there is always some kind of analogue of 
“the character table is square” theorem. 

It turns out that when we consider representations 
in more general contexts— Iie algebras acting on vec- 
tor spaces, quantum groups, p-adic groups on infinite- 
dimensional complex or p-adic vector spaces, etc. — 
these qualitative features stay the same. 

The second moral is that we should always hope 
for some “non- Abelian Fourier transform”: that is, a 
set that parametrizes irreducible representations and 
a description of the character values in terms of this 
set. 

In the case of real reductive groups Harish-Chandra’s 
work provides such an answer, generalizing the Weyl 
character formula for compact groups; for arbitrary 
groups no such answer is known. For special classes 
of groups, there are partially successful general princi- 
ples (the orbit method, Broué’s conjecture), of which 
the deepest are the extraordinary circle of conjec- 


tures known as the Langlands program, which we shall 
discuss later. 

5 Interlude: The Philosophical Lessons of 
“The Character Table Is Square” 

Our basic theorem (“the character table is square”) tells 
us to expect that the category of all irreducible rep- 
resentations of G is interesting when the conjugacy- 
class structure of G is in some way under control. We 
will finish this essay by explaining a remarkable fam- 
ily of examples of such groups— the rational points of 
reductive algebraic groups — and their conjectured rep- 
resentation theory, which is described by the Langlands 
program. 

An affine algebraic group is a subgroup of some 
group GL n that is dehned by polynomial equations in 
the matrix coefficients. For example, the determinant 
of a matrix is a polynomial in the matrix coefficients, 
so the group SL n , which consists of all matrices in GL n 
with determinant 1, is such a group. Another is SO n , 
which is the set of matrices with determinant 1 that 
satisfy the equation AA T = I. 

The above notation did not specify what sort of coef- 
ficients we were allowing for the matrices. That vague- 
ness was deliberate. Given an algebraic group G and 
a held k, let us write G(k) for the group where the 
coefficients are taken to have values in k. For exam- 
ple, SL n (F„) is the set of n x n matrices with coeffi- 
cients in the finite field F^ and determinant 1. This 
group is finite, as is SO n (Fq), while SL n (R) and SO n (R) 
are lie groups. Moreover, SO ra (R) is compact, while 
SL ra (R) is not. So among affine algebraic groups over 
fields one already finds all three types of groups we 
have discussed: finite groups, compact Iie groups, and 
noncompact Lie groups. 

We can think of SL n (R) as the set of matrices in 
SL ra (C) that are equal to their complex conjugates. 
There is another involution on SL ra (C) that is a sort 
of “twisted” form of complex conjugation, where we 
send a matrix A to the complex conjugate of (A -1 ) T . 
The fixed points of this new involution (that is, the 
determinant- 1 matrices A such that A equals the com- 
plex conjugate of (A -1 ) T ) form a group called SU n (R). 
This is also called a real form of SL n (C), 10 and it is 
compact. 


10. When we say that SL„(r) and SU„(M) are both “real forms’’ of 
SL„(c), what is meant more precisely is that in both cases the group 
can be described as a subgroup of some group of real matrices that 
consists of all solutions to a set of polynomial equations, and that 
when the same set of equations is applied instead to the group of 
complex matrices the result is isomorphic to SL„(C). 
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The groups SL„(F<j) and SO re (F<j) are almost simple 
groups; 11 the classiflcation of finite simple groups tells 
us, mysteriously, that all but twenty-six of the finite 
simple groups are of this form. A much, much easier 
theorem tells us that the connected compact groups are 
also of this form. 

Now, given an algebraic group G, we can also con- 
sider the instances G(Q P ), where © p is the held of 
p-adic numbers, and also G(Q). For that matter, we may 
consider G(k) for any other held k, such as the func- 
TION FIELD OF AN ALGEBRAIC VARIETY [V.33]. The leS- 
son of section 4 is that we may hope for all of these 
many groups to have a good representation theory, 
but that to obtain it there will be serious “analytic” or 
“arithmetic” difficulties to overcome, which will depend 
strongly on the properties of the held k. 

Lest the reader adopt too optimistic a viewpoint, we 
point out that not every affine algebraic group has a 
nice conjugacy-class structure. For example, let V n be 
the set of upper triangular matrices in GL n with ls 
along the diagonal, and let k be F^. For large n, the con- 
jugacy classes in V n ( F<j) form large and complex fami- 
lies: to parametrize them sensibly one needs more than 
n parameters (in other words, they belong to families 
of dimension greater than n, in an appropriate sense), 
and it is not in faet known how to parametrize them 
even for a smallish value of n, such as 11. (It is not 
obvious that this is a “good” question though.) 

More generally, solvable groups tend to have horrible 
conjugacy-class structure, even when the groups them- 
selves are “sensible.” So we might expect their repre- 
sentation theory to be similarly horrible. The best we 
can hope for is a result that describes the entries of 
the character table in terms of this horrible structure— 
some kind of non-Abelian Fourier integral. For certain 
p-groups Kirillov found such a result in the 1960s, as 
an example of the “orbit method,” but the general result 
is not yet known. 

On the other hånd, groups that are similar to con- 
nected compact groups do have a nice conjugacy-class 
structure: in particular, finite simple groups do. An 
algebraic group is called reductive if G (C) has a com- 
pact real form. So, for instance, SL n is reductive by the 
existence of the real form SU n (M). The groups GL n and 
SO ra are also reductive, but V n is not. 12 


1 1 . Which is to say that the quotient of these groups by their center 
is simple. 

12. The miracle, not relevant for this discussion, is that compact 
connected groups can be easily classified. Each one is essentially a 
product of circles and non-Abelian simple compact groups. The latter 


Let us examine the conjugacy classes in the group 
SU n . Every matrix in SU n (M) can be diagonalized, and 
two conjugate matrices have the same eigenvalues, up 
to reordering. Conversely, any two matrices in SU n (R) 
with the same eigenvalues are conjugate. Therefore, the 
conjugacy classes are parametrized by the quotient of 
the subgroup of all diagonal matrices by the action of 
S n that permutes the entries. 

This example can be generalized. Any compact con- 
nected group has a maximal torus T, that is, a maximal 
subgroup isomorphic to a product of circles. (In the pre- 
vious example it was the subgroup of diagonal matri- 
ces.) Any two maximal tori are conjugate in G, and any 
conjugacy class in G intersects T in a unique IT-orbit on 
T, where W is the Weyl group, the finite group N(T)/T 
(where N(T) is the normalizer of T). 

The description of conjugacy classes in G(k), for an 
algebraically closed held k, is only a little more compli- 
cated. Any element g g G(k) admits a jordan decom- 
position IIII.45): it can be written as g = su = us, 
where s is conjugate to an element of T(k) and u is 
unipotent when considered as an element of GL n (k). 
(A matrix A is unipotent if some power of A - I is 
zero.) Unipotent elements never intersect compact sub- 
groups. When G = GL„ this is the usual Jordan decom- 
position; conjugacy classes of unipotent elements are 
parametrized by partitions of n, which, as we men- 
tioned in section 2, are precisely the conjugacy classes 
of W = S n . For general reductive groups, unipotent con- 
jugacy classes are again almost the same thing as con- 
jugacy classes in W. 13 In particular, there are finitely 
many, independent of k. 

Finally, when k is not algebraically closed, one 
describes conjugacy classes by a kind of Galois descent; 
for example, in GL n (k), semisimple classes are still 
determined by their characteristic polynomial, but the 
faet that this polynomial has coefficients in k con- 
strains the possible conjugacy classes. 

The point of describing the conjugacy-class structure 
in such detail is to describe the representation theory 
in analogous terms. A erude feature of the conjugacy- 
class structure is the way it decouples the held k from 
finite combinatorial data that is attached to G but inde- 
pendent of k— things like W, the lattice defining T, 
roots, and weights. 


are parametrized by dynkin diagrams [III.50 §3]. They are SU„, Sp 2n , 
SO n , and five others, denoted Es, Ey, Es, F4, and G>- That is it! 

1 3 . They are diff erent, but related. Precisely, they are given by com- 
binatorial data, Lusztig’s two-sided cells for the corresponding affine 
Weyl group. 
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The “philosophy” suggested by the theorem that the 
character table is square suggests that the represen- 
tation theory should also admit such a decoupling: it 
should be built out of the representation theory of 
k*, which is the analogue of the circle, and out of 
the combinatorial structure of G(k) (such as the finite 
groups W). Moreover, representations should have a 
“Jordan decomposition”: 14 the “unipotent” represen- 
tations should have some kind of combinatorial com- 
plexity but little dependence on k, and compact groups 
should have no unipotent representations. 

The Langlands program provides a description along 
the lines laid out above, but it goes beyond any of the 
results we have suggested in that it also describes the 
entries of the character table. Thus, for this class of 
examples, it gives us (conjecturally) the hoped-for “non- 
Abelian Fourier transform.” 

6 Coda: The Langlands Program 

And so we conclude by just hinting at statements. 
If G(k) is a reductive group, we want to describe an 
appropriate category of representations for G(fc), or at 
least the character table, which we may think of as a 
“semisimplification” of that category. 

Even when k is finite, it is too much to hope that con- 
jugacy classes in G(k) parametrize irreducible repre- 
sentations. But something not so far off is conjectured, 
as follows. 

To a reductive group G over an algebraically closed 
held, Langlands attaches another reductive group L G, 
the Langlands dual, and conjectures that representa- 
tions of G(k) willbe parametrizedby conjugacy classes 
in l G(C). 15 Flowever, these are not conjugacy classes 
of elements of L G(C), as before, but of homomorphisms 
from the Galois group of k to L G. The Langlands dual 
was originally defined in a combinatorial manner, but 
there is now a conceptual definition. A few examples 
of pairs (G, Hl) are (GL^GI-n) , (S0 2 n+i, Sp 2n ), and 
i SI . PC . I .»,• C 

In this way the Langlands program describes the rep- 
resentation theory as built out of the structure of G and 
the arithmetic of k. 


14. The ftrst such theorems were proved for GL„ (F^) by Green and 
Stefnberg. However, the notion of Jordan decomposition for charac- 
ters originates with Brauer, in his work on modular representation 
theory. It is part of his modulår analogue of the “character table is 
square” theorem, which we mentioned in section 3. 

15. The C here isbecause we are looking at representations on com- 
plex vector spaces; if we were looking at representations on vector 
spaces over some field F, we would take hJlF). 
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Though this description indicates the flavor of the 
conjectures, it is not quite correct as stated. For 
instance, one has to modify the Galois group 16 in such 
a way that the correspondence is true for the group 
GLi(fc) = k* . When k = IR, we get the representation 
theory of R* (or its compact form S 1 ), which is Fourier 
analysis; on the other hånd, when k is a p-adic local 
held, the representation theory of k* is described by 
local class held theory. We already see an extraordinary 
aspect of the Langlands program: it precisely unifies 
and generalizes harmonic analysis and number theory. 

The most compelling conjectural versions of the 
Langlands program are “equivalences of derived cat- 
egories” between the category of representations and 
certain geometric objects on the spaces of Lang- 
lands parameters. These conjectural statements are the 
hoped-for Fourier transforms. 

Though much progress has been made, a large part of 
the Langlands program remains to be proved. For finite 
reductive groups, slightly weaker statements have been 
proved, mostly by Lusztig. As all but twenty-six of the 
finite simple groups arise from reductive groups, and 
as the sporadic groups have had their character tables 
computed individually, this work already determines 
the character tables of all the finite simple groups. 

For groups over R, the work of Harish-Chandra and 
later authors again confirms the conjectures. But for 
other helds, only fragmentary theorems have been 
proved. There is much still to be done. 

Further Reading 

A nice introductory text on representation theory is 
Alperin’s Local Representation Theory (Cambridge Uni- 
versity Press, Cambridge, 1993). As for the Langlands 
program, the 1979 American Mathematical Society vol- 
ume titled Automorphic Forms, Representations, and 
L-functions (but universally known as “The Corvallis 
Proceedings”) is more advanced, and as good a place 
to start as any. 


IV. 10 Geometric and Combinatorial 
Group Theory 

Martin R. Bridson 


16. The appropriately modified Galois group is called the Weil- 
Deligne group. 
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1 What Are Combinatorial and 
Geometric Group Theory? 

Groups and geometry are ubiquitous in mathematics, 
groups because the symmetries (or automorphisms 
[1.3 §4.1]) of any mathematical object in any context 
form a group and geometry because it allows one to 
think intuitively about abstract problems and to orga- 
nize families of objects into spaces from which one may 
gain some global insight. 

The purpose of this article is to introduce the reader 
to the study of infinite, discrete groups. I shall discuss 
both the combinatorial approach to the subject that 
held sway for much of the twentieth century and the 
more geometric perspective that has led to an enor- 
mous flowering of the subject in the last twenty years. I 
hope to convince the reader that the study of groups is 
a concern for all of mathematics rather than something 
that belongs particularly to the domain of algebra. 

The principal focus of geometric group theory is the 
interaction of geometry/topology and group theory, 
through group actions and through suitable transla- 
tions of geometric concepts into group theory. One 
wants to develop and exploit this interaction for the 
benefit of both geometry/topology and group theory. 
And, in keeping with our assertion that groups are 
important throughout mathematics, one hopes to illu- 
minate and solve problems from elsewhere in mathe- 
matics by encoding them as problems in group theory. 

Geometric group theory acquired a distinet identity 
in the late 1980s but many of its principal ideas have 
their roots in the end of the nineteenth century. At 
that time, low-dimensional topology and combinato- 
rial group theory emerged entwined. Roughly speak- 
ing, combinatorial group theory is the study of groups 
defined in terms of presentations, that is, by means of 
generators and relations. In order to follow the rest of 
this introduction the reader must first understand what 
these terms mean. Since their definitions would require 
an unacceptably long break in the flow of our discus- 
sion, I will postpone them to the next section, but I 
strongly advise the reader who is unfamiliar with the 
meaning of the expression r = {a\,...,a n I n , . . . , r m ) 
to pause and read that section before continuing with 
this one. 

The rough definition of combinatorial group theory 
just given misses the point that, like many parts of 
mathematics, it is a subject defined more by its core 
problems and its origins than by its fundamental defi- 
nitions. The initial impetus for the subject came from 


the description of discrete groups of hyperbolic isome- 
tries and, most particularly, the discovery of the fun- 
damental GROUP [IV. 6 §2] of a manifold [1.3 §6.9] by 
poincaré [VI.61] in 1895. The group-theoretic issues 
that emerged were brought into Sharp focus by the 
work of Tietze and Dehn in the first decade of the twen- 
tieth century and drove much of combinatorial group 
theory for the remainder of the century. 

Not all of the epoch-defining problems came from 
topology: other areas of mathematics threw up funda- 
mental questions as well. Here are some of the forms 
they took: Does there exist a group of the following 
type? Which groups have the following property? What 
are the subgroups of . . .? Is the following group infinite? 
When can one determine the structure of a group from 
its finite quotients? In the sections that follow I shall 
attempt to illustrate the mathematical culture associ- 
ated with questions of this kind, but let me immedi- 
ately mention some easily stated but difficult classical 
problems, (i) Let G be a group that is finitely gener- 
ated and suppose that there is some positive integer n 
such that x n = 1 for every x in G. Must G be finite? 
(ii) Is there a finitely presented group T and a surjec- 
tive homomorphism 4> ■ T — ■ T such that </>(y) = 1 for 
some y =t= 1? (iii) Does there exist a finitely presented, 
infinite, simple group [1.3 §3.3]? (iv) Is every countable 
group isomorphic to a subgroup of a finitely generate d 
group, or even a finitely presented group? 

The first of these questions was asked by Burnside 
in 1902 and the second by Hopf in connection with 
his study of degree-1 maps between manifolds. I shall 
present the answers to all four questions ( in section 5 ) 
to illustrate an important aspect of both combinatorial 
and geometric group theory: one develops techniques 
that allow the construction of explicit groups with pre- 
scribed properties. Such constructions are of particular 
interest when they illustrate the diversity of possible 
phenomena in other branches of mathematics. 

Another kind of question that raises basic issues 
in combinatorial group theory takes the form: Does 
there exist an algorithm to determine whether or not 
a group (or given elements of a group) has such-and- 
such a property? For example, does there exist an algo- 
rithm that can take any finite presentation and decide 
in a finite number of steps whether or not the group 
presented is trivial? Questions of this type led to a 
profound and mutually beneficial interaction between 
group theory and logic, given full voice by the Hig- 
man embedding theorem, which we shall discuss in 
section 6. Moreover, via the conduit of combinatorial 
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group theory, logic has influenced topology as well: 
one uses group-theoretic constructions to show, for 
example, that there is no algorithm to determine which 
pairs of compact triangulated manifolds are homeo- 
morphic in dimensions 4 and above. This shows that 
certain kinds of classification results that have been 
obtained in two and three dimensions do not have 
higher-dimensional analogues. 

One might reasonably regard combinatorial group 
theory as the attempt to develop algebraic techniques 
to solve the types of questions described above, and in 
the course of doing so to identify classes of groups that 
are worthy of particular study. This last point, the ques- 
tion of which groups deserve our attention, is tackled 
head-on in the final section of this article. 

Some of the triumphs of combinatorial group theory 
are intrinsically combinatorial in nature, but many 
more have had their true nature revealed by the intro- 
duction of geometric techniques in the past twenty 
years. A fine example of this is the way in which Gro- 
mov’s insights have connected algorithmic problems in 
group theory to so-called Alling problems in Rieman- 
nian geometry. Moreover, the power of geometric group 
theory is by no means conAned to improving the tech- 
niques of combinatorial group theory: it naturally leads 
one to think about many other issues of fundamental 
importance. For example, it provides a context in which 
one can illuminate and vastly extend classical rigidity 
theorems [V.26], such as that of Mostow. The key to 
applications such as this is the idea that Anitely gen- 
erated groups can usefully be regarded as geometric 
objects in their own right. This idea has its origins in 
the work of cayley [VI.46] (1878) and Dehn (1905) but 
its full force was recognized and promoted by Gromov, 
starting in the 1980s. It is the key idea that underpins 
the later sections of this article. 

2 Presenting Groups 

How should one describe a group? An example will 
illustrate the standard way of doing so and give some 
idea of why it is often appropriate. 

Consider the familiar tiling of the Euclidean plane 
by equilateral triangles. How might you describe the 
full group Ta of symmetries of this Aling, i.e., the rigid 
motions of the plane that send tiles to tiles? Let us focus 
on a single tile T and a particular edge e of T, and use 
this to pick out three symmetries. The Arst, which we 
shall call a, is the reAection of the plane in the line that 
contains e and the other two, P and y, are the reAec- 
tions in the lines that join the endpoints of e to the 
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midpoints of the opposite edges in T. With some effort 
one can convince oneself that every symmetry of the 
tiling can be obtained by performing these three oper- 
ations repeatedly in a suitable order. One expresses this 
by saying that the set {a, P, y} generates the group Ta. 

A further useful observation is that if one performs 
the operation a twice, the tiling is returned to its origi- 
nal position: that is, a 2 = 1. Likewise, P 2 = y 2 = 1. One 
can also verify that (of/?) 6 = (ay) 6 = (Py) 3 = 1. 

It turns out that the group Ta is completely deter- 
mined by these facts alone, a statement that we sum- 
marize by the notation 

Ta = <a, P, y I oc 2 ,p 2 , y 2 , ( ap ) 6 , (ay) 6 , (py) 3 ). 

The aim of the rest of this section is to say in more 
detail what this means. 

To begin with, notice that from the facts we are given 
we can deduce others: for example, bearing in mind that 
P 2 = y 2 = (Py) 3 = 1, we can show that 

(yp) 3 = (yP) 3 (Py) 3 = l 

as well (where the last equality follows after repeat- 
edly canceling pairs of the form PP or yy). We wish 
to convey the idea that in Ta there are no relationships 
between the generators except those that follow from 
the facts above by this kind of argument. 

Now let us try to say this more formally. We deAne 
a set of generators for a group T to be a subset 5 c T 
such that every element of T is equal to some product of 
elements of S and their inverses. That is, every element 
can be written in the form sfsf 2 ■ ■ ■ Sn, where each s i 
is an element of 5 and each f; is 1 or -1. We then call 
a product of this kind a relation if it is equal to the 
identity in T. 

There is an awkward ambiguity here. When we talk 
about “the product” of some elements of T, it sounds 
as though we are ref erring to another element of T, but 
we certainly did not mean this at the end of the last 
paragraph: a relation is not the identity element of T 
but rather a string of symbols such as ab' 1 a~ 3 bc that 
yields the identity in r when you interpret a,b, and c 
as generators in the set S. In order to be clear about 
this, it is useful to deAne another group, known as the 
free group F(S). 

For concreteness we shall describe the free group 
with three generators, taking our set S to be {a, b, c}. 
A typical element is a “word” in the elements of S 
and their inverses, such as the expression ah _1 a _1 hc 
considered in the previous paragraph. However, we 
sometimes regard two words as the same: for instance, 
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abcc~ 1 ac and abab~ l bc are the same because they 
become identical when we cancel out the inverse pairs 
cc -1 and More formally, we define two such 

words to be equivalent and say that the elements of the 
free group are the equivalence classes [1.2 §2.3]. To 
multiply words together, we just concatenate them: for 
instance, the product of ab -1 and bcca is ab~ l bcca, 
which we can shorten to acca. The identity is the 
“empty word.” This is the free group on three genera- 
tors a, b, and c. It should be clear how to generalize it to 
an arbitrary set S, though we shall continue to discuss 
the set 5 = { a,b,c }. 

A more abstract way of characterizing the free group 
on a,b, and c is to say that it has the following universal 
property : if G is any group and <fi is any function from 
S = {a,b,c} to G, then there is a unique homomor- 
phism $ from F(S) to G that takes a to <p(a), b to <p(b), 
and c to </>(c). Indeed, if we want $ to have these prop- 
erties, then our definition is forced upon us: for exam- 
ple, #(ah _1 ca) will have to be <t>(a)<!>(b)~ l <t>(c)4>(a), 
by the definition of a homomorphism. So the unique- 
ness is obvious. The rough reason that this definition 
really does give rise to a well-defined homomorphism 
is that the only equations that are true in F(S) are ones 
that are true in all groups: in order for 4> not to be a 
homomorphism, one would need a relation to hold in 
F(S) that did not hold in G, but this is impossible. 

Now let us return to our example Fa. We would like 
to prove that it is (isomorphic to) the “freest” group 
with generators a, /i, and y that satisfies the relations 
a 2 = p2 = y 2 = („0)6 = („y)6 = (Øy)3 = ! But 
what exactly is this “freest” group that we are claiming 
is isomorphic to Fa? 

To avoid confusion about the meaning of a, P, and 
y (are they elements of Fa or of the group that we 
are trying to construct that will turn out to be iso- 
morphic to Fa?) we shall use the letters a, b, and c 
when we answer this question. Thus, we are trying to 
build the “freest” group with generators a, b, and c 
that satisfies the relations a 2 = b 2 = c 2 = (ab) 6 = 
( ac ) 6 = ( bc ) 3 = 1, which we denote by G = ( a,b,c \ 
a 2 ,b 2 , c 2 , (ab) 6 , (ac) 6 , (bc) 3 ). 

There are two ways of going about this task. One is 
to imitate the above discussion of the free group itself, 
except that now we say that two words are equivalent 
if you can get from one to the other by inserting or 
deleting not just inverse pairs but also one of the words 
a 2 , b 2 , c 2 , (ab) 6 , (ac) 6 , or (bc) 3 . For example, ab 2 c is 
equivalent to ac in this group. G is then defined to be 


the set of equivalence classes of words with the product 
coming from concatenation. 

A neater way to obtain G is more conceptual and ex- 
ploits the universal property of the free group. As G 
is to be generated by a, b, and c, the universal prop- 
erty of the free group F(S) tells us that there will 
have to be a unique homomorphism 4> from F(S) to G 
such that <F(a) = a, <P(b) = b, and $(c) = c. More- 
over, we require that all of a 2 , b 2 , c 2 , (ab) 6 , (ac) 6 , 
and (bc) 3 must map to the identity element in G. It 
follows that the kernel [1.3 §4.1] of # is a normal 
subgroup [1.3 §3.3] of F(S) that contains the set R = 
{a 2 ,b 2 ,c 2 ,(ab) 6 ,(ac) 6 ,(bc) 3 }. Let us write ((R)) for 
the smallest normal subgroup of F(S) that contains 
R (or equivalently the intersection of all normal sub- 
groups of F(S) that contain R). Then there is a sur- 
jective homomorphism from the quotient [1.3 §3.3] 
F(S)/((R)) to any group that is generated by a, b, and 
c and satisfies the relations a 2 = b 2 = c 2 = (ab) 6 = 
(ac) 6 = (bc) 3 = 1. This quotient itself is the group we 
are looking for: it is the largest group generated by a, 
b, and c that satisfies the relations in R. 

Our assertion about Fa is that it is isomorphic to the 
group G = (a,b,c \ a 2 ,b 2 ,c 2 , (ab) 6 , (ac) 6 , (bc) 3 ) that 
we have just described (in two ways). More precisely, 
the map from F(S)/ «R» to F A that takes a to «, b to P, 
and c to y is an isomorphism. 

The above constructionis very general. If we are given 
a group F, then a presentation of F is a set S that gener- 
ates F, together with a set R c F(S) of relations, such 
that F is isomorphic to the quotient F(S)/((R)). If both 
S and R are finite sets, one says that the presentation 
is finite. A group is finitely presented if it has a finite 
presentation. 

We can also define presentations in the abstract, 
without mentioning a group F in advance: given any 
set S and any subset R c F(S), we just define (5 | R) 
to be the group F(S)/((R)). This is the “freest” group 
generated by S that satisfies the relations in R: the only 
relations that hold in (5 | R) are the ones that can be 
deduced from the relations R. 

A psychological advantage of switching to this more 
abstract setting is that, whereas previously we began 
with a group F and asked how we might present it, we 
can now write down group presentations at will, start- 
ing with any set 5 and prescribing a set of words R in 
the symbols S* 1 . This gives us a very flexible way of 
constructing a wide variety of groups. We might, for 
example, use a group presentation to encode a ques- 
tion from elsewhere in mathematics. We could then ask 
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about the properties of the group thus defined, and see 
what they had to tell us about our original problem. 

3 Why Study Finitely Presented Groups? 

Groups arise across the whole of mathematics as 
groups of automorphisms. These are maps from an 
object to itself that preserve all of the defining struc- 
ture: two examples are the invertible linear maps 
[1.3 §4.2] from a vector space [1.3 §2.3] to itself, 
and the homeomorphisms from a topological space 
[III.92] to itself. Groups encapsulate the essence of sym- 
metry and for this reason demand our attention. We 
are driven to understand their general nature, identify 
groups that deserve particular attention, and develop 
techniques for constructing new groups (from old ones, 
or from new ideas). And, reversing the process of 
abstraction, when given a group, we want to find con- 
crete instances of it. For example, we might like to 
realize it as the group of automorphisms of some 
interesting object, with the aim of illuminating the 
nature of both the object and the group. (See the arti- 
cle on REPRESENTATiON theory [IV.9] for more on this 
theme.) 

3.1 Why Present Groups in Terms of 
Generators and Relations? 

The short answer is that this is the form in which 
groups often “appear in nature.” This is particularly 
true in topology. Before looking at a general result that 
illustrates this point, let us examine a simple example. 
Consider the group D of all isometries of R that are 
generated by the reflections at the points 0, 1, and 2: 
that is, the group generated by the three functions ao, 
ai, and 012, which take x to -x, 2-x, and 4 -x, respec- 
tively. You may recognize this group to be the infinite 
dihedral group, and you may notice that the generator 
a2 is superfluous, since it can be generated from ao 
and ai. But let us close our eyes to these observations 
as we let a presentation emerge from the action. 

To this end, we choose an open interval U with the 
property that the images of U under the maps in D 
cover the whole of the real line, say U = (—\, |). Now 
let us record two pieces of data: the only elements of 
D (apart from the identity) that fail to move U com- 
pletely off itself are ao and ai, and, among all Prod- 
ucts of length at most 3 in those two letters, the only 
nontrivial ones that act as the identity on R are cx\ and 
af. You may like to prove that (ao,ai | a^, af) is a 
presentation of D. 


121 

This is in faet a special case of a general result, which 
we now state. (The proof of it is somewhat involved.) Let 
X be a topological space that is both path connected 
[IV.6 §1] and simply connected [III.95], and let r be a 
group of homeomorphisms from X to itself. Then any 
choice of path-connected open subset U c X such that 
the images of U cover all of X gives rise to a presenta- 
tion r = (S | R), where S = {y e T | y(U) n U 0} 
and R consists of all words w g F(S) of length at most 
3 such that w = 1 in T. Thus, the Identification of a 
suitable subset U provides one with a presentation of 
T, and the task of a group theorist is to determine the 
nature of the group from this information. 

To see how difficult this task is, you might like to 
consider the groups 

G n = (tti , ...,a n I af^ai+iaiaf^, i= 1, . . . ,n), 
where we interpret i + 1 as 1 when i = n. One of G3 and 
G4 is trivial and the other is infinite. Can you decide 
which is which? 

To illustrate a more subtle point, let us consider a 
finitely presented group that we perhaps feel we under- 
stand: the group r A that we were discussing earlier. If 
we want to describe this group to a blind friend unfa- 
miliar with the triangular tiling of the plane, what can 
we say to make her understand the group, or at least 
convince her that we understand the group? 

Our friend might reasonably ask us to list the ele- 
ments of our group, so we begin to describe them as 
Products (words) in the given generators. But as we 
begin to do so we hit a problem: we do not want to 
list any element more than once and in order to avoid 
redundancy we have to know which pairs of words 
uq, tv 2 represent the same element of r A ; equivalently, 
we must be able to recognize which words l W2 are 
relations in the group. Determining which words are 
relations is called the word problem for the group. Even 
in T A this takes some work, and in the groups G n we 
quickly find ourselves at a loss. 

Note that as well as allowing one to list the elements 
of the group effeetively, a solution to the word prob- 
lem also allows one to determine the multiplication 
table, since deciding whether \v\iv2 = IV3 is the same 
as deciding whether uqttqwj' 1 = 1. 

3.2 Why Finitely Presented Groups? 

The packaging of infinite objects into finite amounts 
of data arises throughout mathematics in the vari- 
ous guises of compactness [III.9]. Finite presentation 
is basically a compactness condition: a group can be 
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flnitely presented if and only if it is the fundamental 
group of a reasonable compact space, as we shall see 
later. 

Another good reason for studying flnitely presented 
groups is that the Higman embedding theorem (to be 
discussed later) allows us to encode questions about 
arbitrary turing Machines [IV.20 §1.1] as questions 
about such groups and their subgroups. 

4 The Fundamental Decision Problems 

In exploring the geometry and topology of low-dimen- 
sional manifolds at the beginning of the twentieth cen- 
tury, Max Dehn saw that many of the problems that 
he was wrestling with could be “reduced” to questions 
about flnitely presented groups. For example, he gave 
a simple formula for associating with a knot diagram 
[III.46] a flnite presentation of a group. There was one 
relation for each Crossing in the diagram and he argued 
that the resulting group would be isomorphic to Z if 
and only if the knot was the unknot: that is, if and only 
if it could be continuously deformed into a circle. It 
is extremely hard to tell by staring at a knot diagram 
whether it is actually the unknot, so this seems like a 
useful reduction until one realizes that it can be just as 
hard to tell whether a finitely presented group is iso- 
morphic to z. For example, here is the presentation of 
z that Dehn’s recipe associates with one of smallest 
possible pictures of the unknot, namely a diagram with 
just four crossings: 

(ai,a2,«3,«4,a5 I 

«r 1<l 3«4 : 1 , «2«3 1 ai, <13(14 l d2 1 , 0,40.5 1 a4«3 ^ 1 ). 
Thus Dehn’s investigations led him to understand 
how difflcult it is to extract information from a group 
presentation. In particular, he was the first to identify 
the fundamental role of the word problem, which we 
alluded to earlier, and he was one of the first to begin to 
understand that there are fundamental problems asso- 
ciated with the challenge of developing algorithms that 
extract knowledge from well-defined objects such as 
group presentations. In his famous article of 1912 Dehn 
writes: 

The general discontinuous group is given by n gener- 
ators and m relations between them. . . . Here there are 
above all three fundamental problems whose solution 
is very difflcult and which will not be possible without 
a penetrating study of the subject. 

1. The identity [word] problem: An element of the 
group is given as a product of generators. One 


is required to give a method whereby it may be 
decided in a flnite number of steps whether this 
element is the identity or not. 

2. The transformation [conjugacy] problem: Any two 
elements S and T of the group are given. A method 
is sought for deciding the question whether S and 
T can be transformed into each other, i.e., whether 
there is an element U of the group satisfying the 
relation 

s = uru- 1 . 

3. The isomorphism problem: Given two groups, one 
is to decide whether they are isomorphic or not (and 
further, whether a given correspondence between 
the generators of one group and elements of the 
other is an isomorphism or not). 

We shall take these problems as the starting point 
for three lines of enquiry. First, we shall work toward 
an outline of the proof that all of these problems are, in 
a strict sense, unsolvable for general flnitely presented 
groups. 

The second use that we shall make of Dehn’s prob- 
lems is to hold them up as fundamental measures of 
complexity for each of the classes of groups that we 
subsequently encounter. If we can prove, for example, 
that the isomorphism problem is solvable in one class 
of groups but not in another, then we will have given 
genuine substance to previously vague assertions to the 
effect that the second class is “harder.” 

Finally, I want to make the point that geometry lies 
at the heart of the fundamental issues in combinato- 
rial group theory: it may not be immediately obvious, 
but its implicit presence is nonetheless a fundamental 
trait of group theory and not something imposed for 
reasons of taste. To illustrate this point I shall explain 
how the study of the large-scale geometry of least- 
area disks in riemannian manifolds [1.3 §6.10] is inti- 
mately connected with the study of the complexity of 
word problems in arbitrary finitely presented groups. 

5 New Groups from Old 

Suppose that you have two groups, Gi and G2 , and want 
to combine them to form a new group. The first method 
that is taught in a typical course on group theory is 
to take the Cartesian product Gi x G2: a typical ele- 
ment has the form (g,h) with g e G] and h e G2, 
and the product of (g, h) with ( g',h ') is deftned to be 
(g g ' , hh' ) . The set of elements of the form (g,e) (where 
e is the identity of G2) is a copy of Gi inside Gi x G 2, 
and similarly the set of elements of the form (e, h) is a 
copy of G2. 
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These copies have nontrivial relations between their 
elements: for example, (e,h)(g,e) = (g,e)(e,h). We 
would now like to take two groups fy and fy and com- 
bine them in a different way to form a group called the 
free product fy * fy, which contains copies of Ti and 
fy and as few additional relations as possible. That is, 
we would like there to be embeddings ij : rj fy ' 1 % * fy 
so that ii (Ti ) and fy {fy ) generate fy * fy but they are 
not intertwined in any way. This requirement is neatly 
encapsulated by the following universal property: given 
any group G and any two homomorphisms <pi : fy — G 
and <p2 : fy —* G, there should be a unique homomor- 
phism : fy * fy — G such that <P ° i/ = 4 >j for j = 1,2. 
(Less formally, 4 > behaves Uke </> i on the copy of Ti and 
behaves like <p2 on the copy of fy.) 

It is easy to check that this property characterizes 
fy * fy up to isomorphism, but it leaves open the 
question of whether fy * fy actually exists. (These are 
the standard pros and cons of defining an object by 
means of a universal property.) In the present setting, 
existence is easily established using presentations: let 
(Ai | Ri) be a presentation of fy and let <A2 I R?) be 
a presentation of fy, with Ai and A2 disjoint, and then 
define Ti * fy to be (Ai u A2 I Ri uR 2 ) (where u denotes 
a union of disjoint sets). 

More intuitively, one can define fy * fy to be the set 
of alternating sequences a\b\ ■ ■ ■ a n b n with each a, 
belonging to fy and each bj belonging to fy, with the 
extra condition that none of the a; and bj equals the 
identity, except possibly ai or b n . The group opera- 
tions in Ti and fy extend to this set in an obvious 
way: for example, (aibia2){a' 1 b' 1 ) = ai bi a' 2 b\ , where 
a ' 2 = a2«'i, except that if a2a\ = 1 then the product 
cancels down to a\b' 2 , where b ' 2 = bib'i. 

Free products occur naturally in topology: if one has 
topological spaces X\, X2 with marked points pi g Xi , 
P2 e X 2 , then the fundamental group [IV.6§2] of 
the space X\ v X 2 obtained from X\ u X 2 by mak- 
ing the Identification pi = p2 is the free product 
of ni(Xi,pi) and tt 1 (A'2, p2). The Seifert-van Kam- 
pen theorem tells one how to present the fundamental 
group of a space obtained by gluing Xi and X> along 
larger subspaces. If the inclusion of the subspaces gives 
rise to an injection of fundamental groups, then one 
can express the fundamental group of the resul ting 
space as an amalgamated free product, which we now 
define. 

Let fy and fy be two groups. If some other group 
contains copies of Ti and fy, then the intersection of 
those copies must contain the identity element. The 


free product fy * fy was the freest group we could 
build that was subject to this minimal constraint. Now 
we shall insist that the copies of fy and fy intersect 
nontrivially, specify which of their subgroups must lie 
in the intersection, and build the freest group that 
satisfies this constraint. 

Suppose, then, that Ai is a subgroup of fy and that <p 
is an isomorphism from Ai to a subgroup A2 of fy . As 
in the example of the free product, one can define the 
“freest product that identifies Ai and A2” by means 
of a universal property. Again, one can establish the 
existence of such a group using presentations: if fy = 

(Si | Ri) and fy = (S2 I R2), the group we seek takes 
the form 

(S1US2 |Kiuk 2 uD. 

Here, T = { u a v a 1 a e Ai), where u a is some word 
that represents a in (the presentation of) Ti and v a is 
a word that represents r/ha) in fy. 

This group is called the amalgamated free product of 
fy and fy along Ai and A2. It is often described by the 
casual and ambiguous notation fy *Ai=a 2 fy, or even 
Ti *a fy, where A s Aj is an abstract group. 

Unlike with free products, it is no longer obvious that PUP:Tim 
the maps fy — ■ Ti *a fy implicit in this construction are couidn’t 
injective, but they do turn out to be, as was shown by words.i 
Schreier in 1927. 

A related construction of Higman, Neumann, and 
Neumann in 1949 answers the following question: 
given a group T and an isomorphism ip : Bi - B> 
between subgroups of T, can one always embed T in 
a bigger group so that ip becomes the restriction to Bi 
of a conjugation? 

By now, having seen the idea in the context of 
both free products and amalgamated free products, 
the reader may guess how one goes about answering 
this question: one writes down the presentation of a 
universal candidate for the desired enveloping group, 
denoted T*^, and then one sets about proving that the 
natural map from T to T * ip (which takes each word 
to itself) is injective. Thus, given T = (A | R), we intro- 
duce a symbol t i A (usually called the stable letter), we 
choose for each b e Bi words b, b g T(A) with b = b 
and b = 1 p{b) in T, and we define 

r = {A,t | R, tbr'b - 1 (b G Bl)). 

This is the freest group we can build from T by adjoin- 
ing a new element t and requiring it to satisfy all the 
equations we want it to, namely tbt~ l = b for every 
b e Bi (which we can think of as saying that tbt -1 = 
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ip(b)). This group is called an HNN extension of r (after 
Higman, Neumann, and Neumann). 

Now we must show that the natural map from r to 
r* 0 is injective. That is, if you take an element y of r 
and regard it as an element of T*^, you should not be 
able to use t and the relations in T*^ to cancel y down 
to the identity. This is proved with the help of the fol- 
lowing more general result known as Britton’s lemma. 
Suppose that w is a word in the free group F(A,t). Then 
the only circumstances under which it can give rise to 
the identity in the group r*y are if either it does not 
involve t and represents the identity in T or it involves 
t but can be simplified in an obvious way by contain- 
ing a “pinch.” A pinch is a subword of the form tbt ~ ] , 
where b is a word in F ( A ) that represents an element 
of Bi (in which case we can replace it by i//(i>)), or one 
of the form where b' represents an element of 

B2 (in which case we can replace it by ip _1 (!/))■ Thus, 
if you are given a word that involves t and contains 
no pinches, then you know that it cannot be canceled 
down to the identity. 

A similar noncancellation result holds for the amal- 
gamated free product Ti *a,=a 2 F2. If gi , . . . ,g n belong 
to Ti but not to Ai and hi , . . . , h n belong to Fz but not 
to A2, then the word gi higzhz ■ ■ ■ g n h n cannot equal 
the identity in ri *Ai=a 2 li- 

These noncancellation results do far more than show 
that the natural homomorphisms we have been con- 
sidering are injective: they also demonstrate further 
aspects of freeness in amalgamated free products and 
HNN extensions. For example, suppose that in the amal- 
gamated free product Ti *Ai=a 2 T2 we can find an ele- 
ment g of Ti that generates an infinite group that inter- 
sects Ai in the identity and an element h of P> that does 
the same for A2. Then the subgroup of Ti *ai=a 2 F2 gen- 
erated by g and h is the free group on those two gen- 
erators. With a little more effort, one can deduce that 
any finite subgroup of Ti *a, =a 2 T2 has to be conjugate 
to a subgroup of the obvious copy of either Ti or Fz- 
Similarly, the finite subgroups of T * ,p are conjugates 
of subgroups of T. We shall exploit these facts in the 
constructions that follow. 

There are many ways of combining groups that I 
have not mentioned here. I have chosen to focus on 
amalgamated free products and HNN extensions partly 
because they lead to transparent solutions of the basic 
problems discussed below but more because of their 
primitive appeal and the way in which they arise nat- 
urally in the calculation of fundamental groups. They 
also mark the beginning of arboreal group theory. 


which we will discuss later. If space allowed, I would go 
on to describe semidirect and wreath products, which 
are also indispensable tools of the group theorist. 

Before turning to some applications of HNN exten- 
sions and amalgamated free products, I want to return 
to the Burnside problem, which asks if there exist 
finitely generated infinite groups all of whose ele- 
ments have a given finite order. This question gener- 
ated important developments throughout the twenti- 
eth century, particularly in Russia. It is appropriate to 
mention it here because it provides another illustration 
of the faet that it can be useful to study a universal 
object in order to solve a general question. 

5.1 The Burnside Problem 

Given an exponent m, one clarifies the problem at hånd 
by considering the free Bumside group B n;m given by 
the presentation (ai, . . . , a n \ R m ), where R m consists 
of all mth powers in the free group F(ai, ..., a n ) . It is 
clear that B n m maps onto any group with at most n 
generators in which every element has order dividing 
m. Therefore, there exists a finitely generated infinite 
group with all elements of the same finite order if and 
only if, for suitable values of n and m, the group B n m 
is infinite. Thus, a question that takes the form, Does 
there exist a group such that ...?, becomes a question 
about just one group. 

Novikov and Adian showed in 1968 that B n m is infi- 
nite when n ^ 2 and m Js 667 is odd. Determining 
the exact range of values for which B n;m is infinite is 
an active area of research. Of far greater interest is the 
open question of whether there exist finitely presented 
infinite groups that are quotients of B, vm . Zelmanov 
was awarded the Fields Medal for proving that each 
B ntm has only finitely many finite quotients. 

5.2 Every Countable Group Can Be Embedded in a 
Finitely Generated Group 

Given a countable group G we list its elements, 
go,gi,g2,---, taking go to be the identity. We then take 
a free product of G with an infinite cyclic group (5) = Z. 
Let Ei be the set of all elements of G * z of the form 
s n = g n s n with n ^ 1. Then the subgroup (Ai > gener- 
ated by Ei is isomorphic to the free group F(Ei). Sim- 
ilarly, if we let Ez = {52,53,...} (so it is E\ with the 
element 5i = gis removed), then (Ez) is isomorphic to 
F (E2 ) . It f ollows that the map ip(s n ) = s n + 1 gives rise to 
an isomorphism from (2i) to (Ez). Now take the HNN 
extension (G * z)*y, whose stable letter we denote by 
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t. This group contains a copy of G, as we noted before. 
Moreover, since we have ensured that ts n t _1 = s n + 1 for 
every n ^ 1, it can be generated by just the three ele- 
ments si , s, and t. Thus, we have embedded an arbitrary 
countable group into a group with three generators. (We 
leave the reader to think about how one can vary this 
construction to produce a group with two generators.) 

5.3 There Are Uncountably Many Nonisomorphic 
Finitely Generated Groups 

This was proved by B. H. Neumann in 1932. Since there 
are infinitely many primes, there are uncountably many 
nonisomorphic groups of the form ® pe pZ p , where P 
is an infinite set of primes. We have seen that each of 
these groups can be embedded in a finitely generated 
group, and our earlier comments on finite subgroups 
of HNN extensions show that no two of the resulting 
finitely generated groups are isomorphic. 

5.4 An Answer to Hopf’s Question 

A group G is called Hopfian if every surjective homo- 
morphism from G to G is an isomorphism. Most 
familiar groups have this property: for example, finite 
groups obviously do, as do Z™ (as you can prove using 
linear algebra) and free groups. So too do groups 
of matrices such as SL n (z), as we shall discuss in 
a moment. An example of a non-Hopfian group is 
the group of all infimte sequences of integers (under 
pointwise addition), since the function that takes 
(ai,a2,a.3, . . . ) to (a2,«3.«4. ■■ ■ ) is a surjective homo- 
morphism that contains (1, 0, 0, ... ) in its kernel. But 
is there a finitely presented example? The answer is 
yes, and Higman was the first to construct one. The 
following examples are due to Baumslag and Solitar. 

Let p ^ 2 be an integer and identify Z with the free 
group (a) generated by a single generator a. Then the 
subgroups pz and (p + 1)Z of Z are identified with the 
powers of a p and a p+1 , respectively. Let ip be the iso- 
morphism between these subgroups that takes a p to 
a p+1 and consider the corresponding HNN extension 
B. This has presentation B = (a,t \ ta~ p t~ 1 a p+1 ). The 
homomorphism < p : B -> 5 defmed by t <- t, a <-> a p 
is clearly a surjection but its kernel contains, for exam- 
ple, the element c = ata -1 t -1 a -2 tat -1 a, which does 
not contain a pinch and is therefore not equal to the 
identity, by Britton’s lemma. (If you want to convince 
yourself how useful this lemma is, set p = 3 and try to 
prove directly that c is not equal to the identity in the 
group B just defined.) 
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5.5 A Group that Has No Faithful 
Linear Representation 

One can show that a finitely generated group G of 
matrices over any field is residually finite, which means 
that for each nontrivial element g £ G there exists a 
finite group Q and a homomorphism tt : G — Q with 
tt (g) =t= 1. For example, if you are given an element 
g e SL n (Z) , then you can pick an integer m bigger than 
the absolute values of all the entries in g (which is an 
n x n matrix) and consider the homomorphism from 
SL ra (z) to SL n (z/mZ) that reduces the matrix entries 
mod m. The image of g in the finite group SL n (Z/wtZ) 
is clearly nontrivial. 

Non-Hopfian groups are not residually finite, and 
hence are not isomorphic to a group of matrices over 
any held. One can see that the non-Hopfian group B 
defmed above is not residually finite by considering 
what happens to the nontrivial element c. We saw that 
there was a surjective homomorphism ip : B — B with 
(p(c) = 1. Let c« be an element such that <p n (c n ) = c 
(which exists since ip is a surjection). If there were 
a homomorphism tt from B to a finite group Q with 
tt(c) 1 , then we would have infinitely many distinet 
homomorphisms from B to Q, namely the composi- 
tions tto ip«; these are distinet because tto <p m (c re ) = 1 
if m > n and tt o ip n (c n ) = tt(c) # 1. This is a con- 
tradiction, since a homomorphism from a finitely gen- 
erated group to a finite group is determined by what 
it does to the generators, so there can only be finitely 
many such homomorphisms. 

5.6 Infinite Simple Groups 

Britton’s lemma actually tells us more than that c =t= 1: 
the subgroup A of B generated by t and c is in faet 
a free group on those generators. Thus we may form 
the amalgamated free product r of two copies of B, 
denoted Bi and B2, by gluing toge ther the two copies 
of A with the isomorphism Ci -> t2, ti ~ C2. We have 
seen that in any finite quotient of T = Bi *4 B2, the ele- 
ments Ci (= (2) and C2 (= ti) must have trivial image, 
and it is easy to deduce from this that in faet the quo- 
tient must be trivial. Thus T is an infinite group with no 
finite quotients. It follows that the quotient of T by any 
maximal proper normal subgroup is also infimte (and 
it is simple by maximality). 

The simple group that we have constructed is infinite 
and finitely generated but it is not finitely presentable. 
Finitely presented infinite simple groups do exist, but 
they are mueh harder to construct. 
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6 Higman’s Theorem and Undecidability 

We have seen that there are uncountably many (non- 
isomorphic) finitely generated groups. But as there are 
only countably many finitely presented groups, only 
countably many fmitely generated groups can be sub- 
groups of fmitely presented groups. Which ones are 
they? 

A complete answer to this question is provided by a 
beautiful and deep theorem proved by Graham Higman 
in 1961, which says, roughly, that the groups that arise 
are all those that are algorithmically describable. (If you 
have no idea what this means, even roughly, then you 
might like to read the insolubility of the halting 
problem [V.23] before continuing with this section.) 

A set S of words over a finite alphabet A is called 
recursively enumerable if there is some algorithm (or 
more formally, Turing machine) that can produce a 
complete list of the elements of S. A case of particu- 
lar interest is when A is just a singleton, in which case 
a word is determined by its length and we can think 
of S as a set of nonnegative integers. The elements of 
S need not be listed in a sensible order, so having an 
algorithm that produces an exhaustive list of S does 
not mean that one can use the algorithm to determine 
that some given word w does not belong to S: if you 
imagine standing by your computer as it enumerates 
S, there will not in general come a time when you can 
say to yourself, “If it was going to appear, then it would 
have done so by now,” and therefore be certain that it 
is not in 5. If you want an algorithm with this further 
property, then you need the stronger notion of a recur- 
sive set, which is a set 5 such that S and its complement 
are both recursively enumerable. Then you can list all 
the elements that belong to S and you can also list all 
the elements that do not belong to 5. 

A finitely generated group is said to be recursively 
presentable if it has a presentation with a finite num- 
ber of generators and a recursively enumerable set of 
defming relations. In other words, such a group is not 
necessarily finitely presented, but at least the presen- 
tation of the group is “nice” in the sense that it can be 
generated by some algorithm. 

Higman’s embedding theorem States that a finitely 
generated group G is recursively presentable ifand only 
if it is isomorphic to a subgroup of a finitely presented 
group. 

To get a feeling for how nonobvious this is, you might 
consider the following presentation of the group of all 
rationals under addition, in which the generator a n 


corresponds to the fraction l/n!: 

Q = <ai,ci 2 , ■ ■ ■ I a% = a.n-i Vn ^ 2). 
Higman’s theorem tells us that Q can be embedded 
in a finitely presented group, but no truly explicit 
embedding is known. 

The power of Higman’s theorem is illustrated by the 
ease with which it implies the celebrated undecidabil- 
ity results that were rightly regarded as watersheds 
of twentieth-century mathematics. In order to make 
this case convincingly, I shall give a complete proof 
(except that I shall assume some of the facts men- 
tioned earlier) that there exist finitely presented groups 
with unsolvable word problems, and also that there are 
sequences of finitely presented groups among which 
one cannot decide isomorphism. We shall also see how 
these group-theoretic results can be used to translate 
undecidability phenomena into topology. 

The basic seed of undecidability comes from the faet 
that there are recursively enumerable subsets S c N 
that are not recursive. Using this faet one can read- 
ily construct finitely generated groups with an unsolv- 
able word problem: given such a set of integers S we 
consider 

J = {a,b,t | t(b n ab~ n )t~ l = b n ab~ n Vn g S). 
This is the HNN extension of the free group F{a,b) 
associated with the identity map I — L, where 
L is the subgroup generated by {b n ab~ n : n g 
S}. Britton’s lemma tells us that the word w m = 
t(b m ab~ m )t~ 1 (b m a~ 1 b~ m ) equals 1 e / if and only 
if m g S, and by definition there is no algorithm to 
decide if m g S, so we cannot decide which of the u> m 
are relations. Thus J has an unsolvable word problem. 

That there exist finitely presented groups for which 
the word problem is unsolvable is a mueh deeper faet, 
but with Higman’s embedding theorem at hånd the 
proof becomes almost trivial: Higman tells us that J can 
be embedded in a finitely presented group T, and it is a 
relatively straightforward exercise to show that if one 
cannot decide which words in the generators of J rep- 
resent the identity, then one cannot decide for arbitrary 
words in the generators of T either. 

Once one has a finitely presented group with an 
unsolvable word problem, it is easy to translate unde- 
cidability into all manner of other problems. For exam- 
ple, suppose that r = (A | R) is a finitely presented 
group with an unsolvable word problem, where A = 
{ai,...,a n } and no a l equals the identity in T. For each 
word w made out of the letters in A and their inverses, 
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define a group f w to have presentation 

(A,s,t | R, i= 1, . . . ,n). 

It is not hard to show that if w = 1 in r then I w is the 
free group generated by 5 and t. If w ± 1, then r w is 
an HNN extension. In particular, it contains a copy of 
T, and hence has an unsolvable word problem, which 
means that it cannot be a free group. Thus, since there 
is no algorithm to decide whether w = 1 in T, one can- 
not decide which of the groups f w are isomorphic to 
which others. 

A variant of this argument shows that there is no 
algorithm to determine whether or not a given finitely 
presented group is trivial. 

We shall see in a moment that every finitely pre- 
sented group G is the fundamental group of some com- 
pact four-dimensional manifold. By following a stan- 
dard proof of this theorem with considerable care, 
Markov proved in 1958 that in dimensions 4 and above 
there is no algorithm to decide which compact mani- 
folds (presented as simplicial complexes, for example) 
are homeomorphic. His basic idea was to show that if 
there were an algorithm to determine which triangu- 
lated 4-manifolds are homeomorphic, then one could 
use it to determine which finitely presented groups are 
trivial, which we know is impossible. In order to imple- 
ment this idea one has to be careful to arrange that the 
4-manifolds associated with different presentations of 
the trivial group are homeomorphic: this is the delicate 
part of the argument. 

Strikingly, there does exist an algorithm to decide 
which compact three-dimensional manifolds are iso- 
morphic. This is an extremely deep theorem that relies 
in particular on Perelman’s solution to thurston’s 
GEOMETRIZATION CONJECTURE [IV. 7 §2.4]. 

7 Topological Group Theory 

Let us change perspective now and look at the sym- 
bols P = {ai,...,a.2 I n r m ) through the eyes of 

a topologist. Instead of interpreting P as a recipe for 
constructing a group, we regard it as a recipe for con- 
structing a topological space [III.92], or more specif- 
ically a two-dimensional complex. Such spaces consist 
of points, called vertices, some of which are linked by 
directed paths, called edges, or 1 -cells. If a collection of 
such 1 -cells forms a cycle, then it can be filled in with 
a face, or 2-cell: topologically speaking, each face is a 
disk with a directed cycle as its boundary. 

To see what this complex is, let us first consider the 
standard presentation P = ( a,b | aba~ ] b _1 > of Z 2 . 
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(This is generated by a and b and the relation tells 
us that ab = ba.) We begin with a graph K 1 that 
has a single vertex and two edges (which are loops) 
that are directed and labeled a and b. Next, we take a 
square [0, 1] x [0, 1], the sides of which are directed and 
labeled a, b, a~ l , b~ l as we proceed around the bound- 
ary. Imagine gluing the boundary of the square to the 
graph so as to respect the labeling of edges: with a bit 
of thought, you should be able to see that the result 
is a torus, that is, a surface in the shape of a bagel. 
An observation that turns out to be important is that 
the fundamental group of the torus is Z 2 , the group we 
started with. 

The idea of “gluing” is made precise by the use of 
attaching maps: we take a continuous map <p from the 
boundary of the square S to the graph K 1 that sends 
the corners of the square to the vertex of K 1 and sends 
each side (minus its vertices) homeomorphically onto 
an open edge. The torus is then the quotient of K 1 u 5 
by the equivalence relation that identifies each x in the 
boundary of the square with its image (fix). 

With this more abstract language in hånd, it is easy to 
see how the above construction generalizes to arbitrary 
presentations: given a presentation P = (ai,...,a n \ 
r\,...,r m ), one takes a graph with a single vertex and 

n oriented loops, which are labeled a\ a n . Then 

for each rj one attaches a polygonal disk by gluing its 
boundary Circuit to the sequence of oriented edges that 
traces out the word rj. 

In general, the result will not be a surface as it was for 
( a , b | aba~ l b~ l ). Rather, it will be a two-dimensional 
complex with singularities along the edges and at the 
vertex. You may find it instructive to do some more 
examples. From (a \ a 2 ) one gets the projective plane; 
from ( a,b,c,d \ aba~ l b~ l ,cdc~ l d) one gets a torus 
and a Klein bottle stuck together at a point. Picturing 
the 2-complex for (a, b \ a 2 ,b 3 , (ab) 3 ) is already rather 
difficult. 

The construction of K(P) is the beginning of topo- 
logical group theory. The Seifert-van Kampen theorem 
(mentioned earlier) implies that the fundamental group 
of K(P) is the group presented by P. But the group 
no longer sits inertly in the form of an inscrutable 
presentation— now it acts on the universal covering 
[III.95] of K(P) by homeomorphisms known as “deck 
transformations.” Thus, through the simple construc- 
tion of K(P) (and the elegant theory of covering spaces 
in topology) we achieve our aim of realizing an abstract 
finitely presented group as the group of symmetries of 
an object with a potentially rich structure, on which we 
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can bring global geometric and topological techniques 
to bear. 

To obtain an improved topological model for our 
group, we can embed K (P) in R 5 (just as one can embed 
a finite graph [III.34] in R 3 ) and consider the compact 
four-dimensional manifold M obtained by taking all 
points that are a small fixed distance from the image. 
(I am assuming that the embedding is suitably “tame,” 
which one can arrange.) The mental picture to strive for 
here is a higher-dimensional analogue of the surface 
(sleeve) one gets by taking the points in R 3 that are a 
small fixed distance from an embedded graph. The fun- 
damental group of M is again the group presented by P, 
so now we have our arbitrary finitely presented group 
acting on a manifold (the universal cover of M). This 
allows us to use the tools of analysis and differential 
geometry. 

The constructions of K(P) and M establish the more 
difficult implication of the theorem, promised earlier, 
that a group can be finitely presented if and only if 
it is the fundamental group of a compact cell com- 
plex and of a compact 4-manifold. This result raises 
several natural questions. First, are there better, more 
informative, topological models for an arbitrary finitely 
presented group T? And if not, then what can one 
say about the classes of groups defined by the natu- 
ral constraints that arise when one tries to improve 
the model? For example, we would like to construct 
a lower-dimensional manifold with fundamental group 
T, enabling us to exploit our physical insight into three- 
dimensional geometry. But it turns out that the fun- 
damental groups of compact three-dimensional man- 
ifolds are very special; this observation Ues near the 
heart of a great deal of mathematics at the end of 
the twentieth century. Other interesting fields open up 
when one asks which groups arise as the fundamen- 
tal groups of compact spaces satisfying curvature 
[III. 13] conditions, or constraints coming from complex 
geometry. 

A particularly rich set of constraints comes from the 
following question. Can one arrange for an arbitrary 
finitely presented group to be the fundamental group 
of a compact space (a complex or manifold, perhaps) 
whose universal cover is contractible [IV.6 §2]? This 
is a natural question from the point of view of topology 
because a space with a contractible universal cover is, 
up to homotopy [IV.6 §2], completely determined by 
its fundamental group. If the fundamental group is T, 
then such a space is caUed a classifying space for r and 
its homotopy-invariant properties provide a rich array 


of invariants for the group T (getting away from the 
gross dependence that K(P) has on P rather than T). 

If our earlier discussion of how hard it is to recognize 
r from P has left you very skeptical about whether this 
dependence can actually be removed, then your skep- 
ticism is well-founded: there are many obstructions to 
the construction of compact classifying spaces for an 
arbitrary finitely presented group; the study of them 
(under the generic name finiteness conditions) is a rich 
area at the interface of modern group theory, topology, 
and homological algebra. 

One aspect of this area is the search for natural 
conditions that ensure the existence of compact clas- 
sifying spaces (not necessarily manifolds). This is one 
of several places where manifestations of nonpositive 
curvature play a fundamental role in modern group 
theory. More combinatorial conditions also arise. For 
example, Lyndon proved that for any presentation P ^ 
{A \ r) where the single defining relation r e F(A) is 
not a nontrivial power, the universal cover of K(P) is 
contractible. 

A neighboring and highly active area of research con- 
cerns questions of uniqueness and rigidity for classi- 
fying spaces. (Here, as is common, the word rigidity 
is used to describe a situation in which requiring two 
objects to be equivalent in an apparently weak sense 
forces them to be equivalent in an apparently stronger 
sense.) For example, the (open) Borel conjecture asserts 
that if two compact manifolds have isomorphic funda- 
mental groups and contractible universal covers, then 
those manifolds must be homeomorphic. 

I have been talking mostly about realizing groups as 
fundamental groups, which led to certain free actions. 
That is, we could interpret the elements of the group 
as symmetries of a topological space and none of these 
symmetries had any fixed points. Before moving on to 
geometric group theory I should point out that there 
are many situations in which the most illuminating 
actions of a group are not free: one instead allows well- 
understood stabilizers. (The stabilizer of a point is the 
set of all symmetries in the group that leave that point 
fixed.) For example, the natural way in which to study 
Ia is by its action on the triangulated plane, each vertex 
of which is left unmoved by twelve symmetries. 

A deeper illustration of the merits of seeking insight 
into algebraic structure through nonfree actions on 
suitable topological spaces comes from the Bass-Serre 
theory of groups acting on trees, which subsumes the 
theory of amalgamated free products and HNN exten- 
sions, whose potency we saw earlier. (This theory and 
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its extensions often go under the heading of arboreal 
group theory.) 

A tree is a connected graph that has no circuits in it. 
It is helpful to regard it as a metric space [III. 5 8] in 
which each edge has length 1. The group actions that 
one allows on trees are those that take edges to edges 
isometrically, never flipping an edge. 

If a group r acts on a set X (in other words, if it can 
be regarded as a group of symmetries of X), then the 
orbit of a point x e X is the set of all its images gx with 
ger. A group r can be expressed as an amalgamated 
free product A*cB if and only if it acts on a tree in such 
a way that there are two orbits of vertices, one orbit of 
edges, and stabilizers A, B, C (where A and B are the 
stabilizers of adjacent vertices and intersect in C, which 
is the edge stabilizer). HNN extensions correspond to 
actions with one orbit of vertices and one orbit of edges. 
Thus, amalgamated free products and HNN extensions 
appear as graphs ofgroups, which are the basic objects 
of Bass-Serre theory. These objects allow one to recover 
groups acting on trees from the quotient data of the 
action, i.e., the quotient space (which is a graph) and 
the pattern of edge and vertex stabilizers. 

An early benefit of Bass-Serre theory is a transparent 
and instructive proof that any finite subgroup of A * c B 
is conjugate to a subgroup of either A or B: given any set 
V of vertices in a tree, there is a unique vertex or mid- 
point x minimizing ma x{d(x,v) \ v eV}; one applies 
this observation with V an orbit of the finite subgroup; 
x provides a fixed point for the action of the subgroup; 
and any point stabilizer is conjugate to a subgroup of 
either A or B. 

Arboreal group theory goes much deeper than this 
first application suggests. It is the basis for a decompo- 
sition theory of finitely presented groups from which 
it emerges, for example, that there is an essentially 
canonical maximal splitting of an arbitrary finitely pre- 
sented group as a graph of groups with cyclic edge 
stabilizers. This provides a striking parallel with the 
decomposition theory of 3-manifolds, a parallel that 
extends far beyond a mere analogy and accounts for 
much of the deepest work in geometric group theory 
in the past ten years. If you want to learn more about 
this, search the literature for JSJ decompositions. You 
may also want to search for complexes ofgroups, which 
provide the appropriate higher-dimensional analogue 
for graphs of groups. 
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8 Geometric Group Theory 

Let us refresh the image of K(P) in our mind’s eye 
by thinking again about the presentation P = (a,b \ 
aba~ l b~ 1 > of z. The complex K (P), as we saw earlier, 
is a torus. Now the torus can be defined as the quotient 
of the Euclidean plane R 2 by the action of the group 
z 2 (where the point (m, n) e z 2 acts as the translation 
(x, y) <-* {x + m,y + n )): in faet, R 2 , with an appropri- 
ate square tiling, is the universal cover of the torus. If 
we look at the orbit of the point 0 under this action, 
it forms a copy of z 2 , and one can thereby see the 
large-scale geometry of z 2 laid out for us. We can make 
the idea of the “geometry of Z 2 ” precise by decreeing 
that edges of the tiling have length 1 and defining the 
graph distance between vertices to be the length of the 
shortest path of edges connecting them. 

As this example shows, the construction of K(P) 
involves the two main (intertwined) strands of geomet- 
ric group theory. In the first and more classical strand, 
one studies actions of groups on metric and topologi- 
cal spaces in order to elucidate the structures of both 
the space and the group (as with the action of Z 2 on the 
plane in our example, or the action of the fundamental 
group of K(P) on its universal cover in general). The 
quality of the insights that one obtains varies accord- 
ing to whether the action has or does not have certain 
desirable properties. The action of z 2 on R 2 consists of 
isometries on a space with a fine geometric structure, 
and the quotient (the torus) is compact. Such actions 
are in many ways ideal, but sometimes one accepts 
weaker admission criteria in order to obtain a more 
diverse class of groups, and sometimes one demands 
even more structure in order to narrow the focus and 
study groups and spaces of an exceptional, but for that 
reason interesting, character. 

This first strand of geometric group theory mingles 
with the second. In the second strand, one regards 
finitely generated groups as geometric objects in their 
own right equipped with word metrics, which are 
defined as follows. Given a finite generating set S for 
a group T, one defines the Cayley graph of T by joining 
each element y e r by an edge to each element of the 
form ys or ys~ x with s e S (which is the same as the 
graph formed by the edges of the universal covering 
of K(P)). The distance ds(yi,y2) between y\ and y2 is 
then the length of the shortest path from yi to y2 if all 
edges have length 1. Equivalently, it is the length of the 
shortest word in the free group on S that is equal to 
yf'y2 mr. 
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The word metric and the Cayley graph depend on the 
choice of generating set but their large-scale geometry 
does not. In order to make this idea precise, we intro- 
duce the notion of a quasi-isometry. This is an equiva- 
lence relation that identifies spaces that are similar on 
a large scale. If X and Y are two metric spaces, then a 
quasi-isometry from X to Y is a function <p ■ X — ■ Y 
with the following two properties. First, there are pos- 
itive constants c, C, and c such that cd(x,x') - e < 
d(4>(x),<f>(x')) < Cd(x,x') + e: this says that <£ dis- 
torts sufficiently large distances by at most a constant 
factor. Second, there is a constant C' such that for every 
yEf there is some x g X for which d((f>(x),y) ^ C': 
this says that <p is a “quasi-surjection” in the sense that 
every element of Y is close to the image of an element 
of X. 

Consider for example the two spaces M 2 and z 2 , 
where the metric on Z 2 is given by the graph distance 
defined earlier. In this case the map : R 2 — Z 2 
that takes (x,y) to (LxJ, \_y\) (where LxJ denotes the 
largest integer less than or equal to x) is easily seen to 
be a quasi-isometry: if the Euclidean distance d between 
two points (x,y) and (x',y') is at least 10, say, then 
the graph distance between (LxJ, Yy\) and (Lx'J, Yy’\) 
will certainly lie between \d and 2 d. Notice how lit- 
tle we care about the local structure of the two spaces: 
the map 4> is a quasi-isometry despite not even being 
continuous. 

It is not hard to check that if <£ is a quasi-isometry 
from X to T, then there is a quasi-isometry <// from Y to 
X that “quasi-inverts” <p, in the sense that every x in X 
is at most abounded distance from <//ø(x) and every y 
in Y is at most a bounded distance from <f>ip(y). Once 
one has established this, it is easy to see that quasi- 
isometry is an equivalence relation. 

Returning to Cayley graphs and word metrics, it turns 
out that if you take two different sets of generators for 
the same group, then the resulting Cayley graphs will be 
quasi-isometric. Thus, any property of a Cayley graph 
that is invariant under quasi-isometry will be a property 
not just of the graph but of the group itself . When deal- 
ing with such invariants we are free to think of T itself 
as a space (since we do not care which Cayley graph we 
form), and we can replace it by any metric space that 
is quasi-isometric to it, such as the universal cover of a 
closed Riemannian manifold with fundamental group T 
(whose existence we discussed earlier). Then the tools 
of analysis can be brought to bear on it. 

A fundamental faet, discovered independently by 
many people and often called the Milnor-Svarc lemma. 


provides a crucial link between the two main strands 
of geometric group theory. Let us call a metric space 
X a length space if the distance between each pair of 
points is the infknum of the lengths of paths joining 
them. The Milnor-Svarc lemma States that if a group T 
acts nontrivially as a set of isometries of a length space 
X, and if the quotient is compact, then T is finitely gen- 
erated and quasi-isometric to X (for any choice of word 
metric). 

We have seen an example of this already: Z 2 is quasi- 
isometric to the Euclidean plane. Less obviously, the 
same is true of 1 a. (Consider the map that takes each 
element « of Ta to the point of z 2 nearest a( 0).) 

The fundamental group of a compact Riemannian 
manifold is quasi-isometric to the universal cover of 
that manifold. Therefore, from the point of view of 
quasi-isometry invariants, the study of such manifolds 
is equivalent to the study of arbitrary finitely presented 
groups. In a moment we will discuss some nontriv- 
ial consequences of this equivalence. But first let us 
reflect on the faet that, when finitely generated groups 
are considered as metric objects in the framework of 
large-scale geometry, they present us with a new chal- 
lenge: we should classify finitely generated groups up to 
quasi-isometry. 

This is an impossible task, of course, but neverthe- 
less serves as a beacon in modern geometric group 
theory, one that has guided us toward many beauti- 
ful theorems, particularly under the general heading of 
rigidity. For example, suppose that you come across a 
finitely generated group T that is reminiscent of Z™ on 
a large scale: in other words, quasi-isometric to it. We 
are not necessarily given any algebraically defined map 
between this mystery group and Z n , and yet it tran- 
spires that such a group must contain a copy of Z n as 
a subgroup of fimte index. 

At the heart of this result is Gromov’s polynomial 
growth theorem, a landmark theorem published in 
1981. This theorem concerns the number of points 
within a distance r of the identity in a finitely generated 
group r. This will be a function /(r), and Gromov was 
interested in how the function f(r) grows as r tends 
to infinity, and what that tells us about the group T. 

If r is an Abelian group with d generators, then it is 
not hard to see that f(r) is at most (2r + l) d (since 
each generator is raised to a power between -r and r). 
Thus, in this case f(r) is bounded above by a polyno- 
mial in r. At the other extreme, if T is a free group with 
two generators a and b, say, then/(r) is exponentially 
large, since all sequences of length r that consist of as 
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and bs (and not their inverses) give different elements 
of r. 

Given this Sharp contrast in behavior, one might won- 
der whether requiring f(r) to be bounded above by 
a polynomial forces r to exhibit a great deal of com- 
mutativity. Fortunately, there is a much-studied defi- 
nition that makes this idea precise. Given any group 
G and any subgroup H of G, the commutator [G,H\ 
is the subgroup generated by all elements of the form 
ghg^h -1 , where g belongs to G and h belongs to H. If 
G is Abehan, then [G, H ] contains just the identity. If G 
is not Abelian, then [G, G] forms a group Gi that con- 
tains other elements besides the identity, but it may be 
that [G, Gi] is trivial. In that case, one says that G is a 
two-step nilpotent group. In general, a k-step nilpotent 
group G is one where, if you form a sequence by setting 
Go = G and Gf+i = [G, Gj] for each i, then you even- 
tually reach the trivial group, and the first time you do 
so is at Gk- A nilpotent group is a group that is k-step 
nilpotent for some k. 

Gromov’s theorem States that a group has polyno- 
mial growth if and only if it has a nilpotent subgroup of 
finite index. This is a quite extraordinary faet: the poly- 
nomial growth condition is easily seen to be indepen- 
dent of the choice of word metric and to be an invariant 
of quasi-isometry. Thus the seemingly rigid and purely 
algebraic condition of having a nilpotent subgroup of 
finite index is in faet a quasi-isometry invariant, and 
therefore a flabby, robust characteristic of the group. 

In the past fifteen years quasi-isometric rigidity the- 
orems have been established for many other classes 
of groups, including lattices in semisimple Lie groups 
and the fundamental groups of compact 3-manifolds 
(where the classification up to quasi-isometry involves 
more than algebraic equivalences), as well as various 
classes defined in terms of their graph of group decom- 
positions. In order to prove theorems of this type, one 
must identify nontrivial invariants of quasi-isometry 
that allow one to distinguish and relate various classes 
of spaces. In many cases such invariants come from 
the development of suitable analogues of the tools of 
algebraic topology, modified so that they behave well 
with respect to quasi-isometries rather than continuous 
maps. 

9 The Geometry of the Word Problem 

It is time to explain the comments I made earlier 
about the geometry inherent in the basic decision prob- 
lems of combinatorial group theory. I shall concentrate 
exelusively on the geometry of the word problem. 
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Gromov’s filling theorem describes a startlingly inti- 
mate connection between the highly geometric study 
of disks with minimal area in riemannian geometry 
[1.3 §6.10] and the study of word problems, which 
seems to belong more to algebra and logic. 

On the geometric side, the basic object of study is 
the isoperimetric funetion Filljf (1) of a smooth compact 
manifold M. Given any closed path of length l, there is a 
disk of minimal area that is bounded by that path. The 
largest such area, over all closed paths of length l, is 
defined to be FUImG). Thus, the isoperimetric funetion 
is the smallest funetion of which it is true to say that 
every closed path of length l can be filled by a disk of 
area at most Fill M (l). 

The image to have in mind here is that of a soap film: 
if one twists a circular wire of length l in Euclidean 
space and dips it in soap, the film that forms has area at 
most i 2 /4 tt, whereas if one performs the same experi- 
ment in hyperbolic space [1.3 §6.6], the area of the film 
is bounded by a linear funetion of l. Correspondingly, 
the isoperimetric funetions of E™ and H™ (and quo- 
tients of them by groups of isometries) are quadratic 
and linear, respectively. In a moment we shall discuss 
what types of isoperimetric funetions arise when one 
considers other geometries (more precisely, compact 
Riemannian manifolds). 

To State the filling theorem we need to think about 
the algebraic side as well. Here, we identify a funetion 
that measures the complexity of a direct attack on the 
word problem for an arbitrary finitely presented group 
T = (A | R). If we wish to know whether a word w 
equals the identity in T and do not have any further 
insight into the nature of T, then there is not mueh we 
can do other than repeatedly insert or remove the given 
relations r g R. 

Consider the simple example T = (a,b \ b 2 a, baba). 
In this group aba 2 b represents the identity. How do we 
prove this? Well, 

aba 2 b = a(b 2 a)ba 2 b = ab(baba)ab 

= abab = a{baba)a = aa -1 = 1. 

Now let us think about the proof geometrically, via the 
Cayley graph. Since aba 2 b = 1 in the group T, we 
obtain a cycle in this graph if we start at the identity 
and go along edges labeled a, b, a, a, b, in that order 
(in which case we visit the vertices 1, a, ab, aba, aba 2 , 
aba 2 b = 1). The equalities in the proof can be thought 
of as a way of “contracting” this cycle down to the iden- 
tity by means of inserting or deleting small loops: for 
instance, we could insert b, a,b, a into the list of edge 
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directions, since baba is a relation, or we could delete 
a trivial loop of the form a, a _1 . This contraction can 
be given a more topological character if we turn our 
Cayley graph into a two-dimensional complex by filling 
in each small loop with a face. Then the contraction of 
the original cycle consists in gradually moving it across 
these small faces. 

Thus, the difficulty of demonstrating that a word w 
equals the identity is intimately connected with the 
area of w, denoted Area(tc), which can be thought of 
algebraically as the smallest sequence of relations you 
need to insert and delete to turn w into the identity, or 
geometrically as the smallest number of faces you need 
to make a disk that filis the cycle represented by w. 

The Dehn function 5r : N — N bounds Areafie ) in 
terms of the length \w\ of the word w. 5r(n) is the 
largest area of any word of length at most n that equals 
1 in r. If the Dehn function grows rapidly, then the 
word problem is hard, since there are short words that 
are equal to the identity, but their area is very large, 
so that any demonstration that they are equal to the 
identity has to be very long. Results bounding the Dehn 
function are called isoperimetric inequalities. 

The subscript on dr is somewhat misleading since 
different finite presentations of the same group will 
in general yield different Dehn functions. This ambi- 
guity is tolerated because it is tightly controlled: if the 
groups defined by two finite presentations are isomor- 
phic, or just quasi-isometric, then the corresponding 
Dehn functions have similar growth rates. More pre- 
cisely, they are equivalent, with respect to what is some- 
times called the standard equivalence relation of 
geometric group theory: given two monotone functions 
f,g:[ 0, oo) -* [0, oo), one writes f 43 if there exists a 
constantC > 0 such that /(l) ^ Cg (Cl+ C) +CI+ C for 
all! ^ 0, and f — g if f 4 g and g 4 f\ and one extends 
this relation to include functions from N to [0, oo). 

You will have noticed a resemblance between the 
definitions of FUImW and 8r(n). The filling theorem 
relates them precisely: it States that if M is a smooth 
compact manifold, then Fill\f (/) as 5r(l), where T is the 
fundamental group ttiM ofM. 

For example, since Z 2 is the fundamental group of 
the torus T = R 2 /Z 2 , which has Euclidean geometry, 
<5 Z 2 ( l ) is quadratic. 

9.1 What Are the Dehn Functions? 

We have seen that the complexity of word problems 
is related to the study of isoperimetric problems in 


Riemannian and combinatorial geometry. Such insights 
have, in the last fifteen years, led to great advances in 
the understanding of the nature of Dehn functions. For 
example, one can ask for which numbers p the func- 
tion n p is a Dehn function. The set of all such numbers, 
which can be shown to be countable, is known as the 
isoperimetric Spectrum, denoted IP, and it is now largely 
understood. 

Following work by many authors, Brady and Brid- 
son proved that the closure of IP is {1} u [2, oo). The 
finer structure of IP was described by Birget, Rips, 
and Sapir in terms of the time functions of Tur- 
ing machines. A further result by the same authors 
and Ol’shanskii explains how fundamental Dehn func- 
tions are to understanding the complexity of arbitrary 
approaches to the word problem for finitely generated 
groups r : the word problem for r lies in NP if and only if 
r is a subgroup of a finitely presented group with poly- 
nomial Dehn function. (Here, NP is the class of prob- 
lems in the famous “P versus NP” question: see compu- 
tational complexity [IV.20 §3] for a description of 
this class.) 

The structure of IP raises an obvious question: What 
can one say about the two classes of groups singled out 
as special — those with linear Dehn functions and those 
with quadratic ones? The true nature of the class of 
groups with a quadratic Dehn function remains obscure 
for the moment but there is a beautifully definitive 
description of those with a linear Dehn function: they 
are the word hyperbolic groups, which we shall discuss 
in the next section. 

Not all Dehn functions are of the form n“: there 
are Dehn functions such as n“logn, for example, 
and others that grow more quickly than any iterated 
exponential, for example that of 

(a,b | aba -1 fcab -1 a -1 b -2 ). 

If r has unsolvable word problem, then 5r ( n ) will grow 
faster than any recursive function (indeed this serves 
as a definition of such groups). 

9.2 The Word Problem and Geodesics 

A closed geodesic on a Riemannian manifold is a loop 
that locally minimizes distance, such as a loop formed 
by an elastic band when released on a perfectly smooth 
surface. Examples such as the great circles on a sphere 
or the waist of an hourglass show that manifolds may 
contain closed geodesics that are null-homotopic. that 
is, they can be moved continuously until they are 
reduced to a point. But can one construct a compact 
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topological manifold with the property that no matter 
what metric one puts on it there will always be infinitely 
many such geodesics? (Technically, if you go around a 
geodesic loop n times, then you get a geodesic; we avoid 
this by counting only “primitive” geodesics.) 

From a purely geometric point of view this is a daunt- 
ing problem: all specific metric information has been 
stripped away and one has to deal with an arbitrary 
metric on the floppy topological object left behind. But 
group theory provides a solution: if the Dehn function 
of the fundamental group tt\M grows at least as fast as 
2 2 ", then in any Riemannian metric on M there will be 
infinitely many closed geodesics that are null-homotopic. 
The proof of this is too technical to sketch here. 

10 Which Groups Should One Study? 

Several special classes of groups have emerged from 
our previous discussion, such as nilpotent groups, 3- 
manifold groups, groups with linear Dehn functions, 
and groups with a single defming relation. Now we shall 
change viewpomt and ask which groups present them- 
selves for study as we set out to explore the universe of 
all finitely presented groups, starting with the easiest 

The trivial group comes first, of course, followed by 
the finite groups. Finite groups are discussed in vari- 
ous other places in this volume, so I shall ignore them 
in what follows and adopt the approach of large-scale 
geometry, blurring the distinction between groups that 
have a common subgroup of finite index. 

The first infinite group is surely Z, but what comes 
next is open to debate. If one wants to retain the 
safety of commutativity, then finitely generated Abe- 
lian groups come next. Then, as one slowly relin- 
quishes commutativity and Control over growth and 
constructibility, one passes through the progressively 
larger classes of nilpotent, polycyclic, solvable, and ele- 
mentary amenable groups. We have already met nil- 
potent groups in our discussion of Gromov’s polyno- 
mial-growth theorem. They crop up in many contexts as 
the most natural generalization of Abelian groups and 
much is known about them, not least because one can 
prove a great deal by induction on the k for which they 
are fc-step nilpotent. One can also exploit the faet that 
G is built from the finitely generated Abelian groups 
Gi/Gi+i in a very controlled way. The larger class of 
polycyclic groups is built in a similar way, while finitely 
generated solvable groups are built in a finite number 
of steps from Abelian groups that need not be finitely 
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generated. This last class is not only larger but wilder; 
the isomorphism problem is solvable among polycyclic 
groups, for example, but unsolvable among solvable 
groups. By definition a group G is solvable if its derived 
series, defined inductively by gW = 
with G <0) = G, terminates in a finite number of steps. 

The concept known as amenability forms an impor- 
tant link between geometry, analysis, and group theory. 
Solvable groups are amenable but not vice versa. It is 
not quite the case that a finitely presented group is 
amenable if and only if it does not contain a free sub- 
group of rank 2, but for a novice this serves as a good 
rule of thumb. 

Now, let us return to Z in a more adventurous frame 
of mind, throw away the security of commutativity, and 
start taking free products instead. In this more liber- 
ated approach, finitely generated free groups appear 
after z as the first groups in the universe. What comes 
next? Thinking geometrically, we might note that free 
groups are precisely those groups that have a tree as a 
Cayley graph and then ask which groups have Cayley 
graphs that are tree-like. 

A key property of a tree is that all of its triangles are 
degenerate: if you take any three points in the tree and 
join them by shortest paths, then every point in one of 
these paths is contained in at least one other path as 
well. This is a manifestation of the faet that trees are 
spaces of infinite negative curvature. To get a feeling 
for why, consider what happens when one rescales the 
metric on a space of bounded negative curvature such 
as the hyperbolic plane H 2 . If we replace the standard 
distance function d(x,y ) by (1 /n)d(x,y) and let n 
tend to oo , then the curvature of this space (in the clas- 
sical sense of differential geometry) tends to -oo. This 
is captured by the faet that triangles look increasingly 
degenerate: there is a constant 5 (n), with S(n) — O 
as n - oo, such that any side of a triangle in the scaled 
hyperbolic space (H 2 , (1 /n)d) is contained in the 5(n)- 
neighborhood of the union of the other two sides. More 
colloquially, triangles in H 2 are uniformly thin and get 
increasingly thin as one rescales the metric. 

With this picture in mind, one might move a little 
away from trees by asking which groups have Cayley 
graphs in which all triangles are uniformly thin. (It 
makes little sense to specify the thinness constant 5 
since it will change when one changes generating set.) 
The answer is Gromov’s hyperbolic groups. This is a 
fascinating class of groups that has many equivalent 
definitions and arises in many contexts. For example, 
we have already met it as the class of groups that have 
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linear Dehn functions. (It is not at all obvious that these 
two definitions are equivalent.) 

Gromov’s great insight is that because the thin-tri- 
angles condition encapsulates so much of the essence 
of the large-scale geometry of negatively curved mani- 
folds, hyperbolic groups share many of the rich proper- 
ties enjoyed by the groups that act nicely by isometries 
on such spaces. Thus, for example, hyperbolic groups 
have only finitely many conjugacy classes of fimte sub- 
groups, contain no copy of Z 2 , and (after accounting 
for torsion) have compact classifying spaces. Their con- 
jugacy problems can be solved in less than quadratic 
time, and Sela showed that one can even solve the 
isomorphism problem among torsion-free hyperbolic 
groups. In addition to their many fascinating proper- 
ties and natural definition, a further source of interest 
in hyperbolic groups is the faet that in a precise sta- 
tistical sense, a random finitely presented group will be 
hyperbolic. 

Spaces of negative and nonpositive curvature have 
played a central role in many branches of mathemat- 
ics in the last twenty years. There is no room even to 
begin to justify this assertion here but it does guide us 
in where to look for natural enlargements of the class 
of hyperbolic groups: we want nonpositively curved 
groups, defined by requiring that their Cayley graphs 
enjoy a key geometric feature that cocompact groups 
of isometries inherit from simply connected spaces of 
nonpositive curvature (“CAT(O) spaces”). But in con- 
trast to the hyperbolic case, the class of groups that one 
obtains varies considerably when one perturbs the def- 
inition, and delineating the resulting classes and their 
(rich) properties has been the subject of much research. 

The added complications that one encounters when 
one moves from negative to nonpositive curvature are 
exemplified by the faet that the isomorphism problem 
is unsolvable in one of the most prominent classes that 
arises: the so-called combable groups. 

Let us now return to free groups and ask which 
hyperbolic groups are the immediate neighbors of free 
groups. Remarkably, this vague question has a convinc- 
ing answer. 

One of the great triumphs of arboreal group theory 
is the proof that there is a finite description of the 
set Hom(G, F) of homomorphisms from an arbitrary 
finitely generated group G to a free group F. The basic 
budding blocks in this description are what Sela calls 
limit groups. One of the many ways of defining a limit 
group L is that for each finite subset X c L there 


should exist a homomorphism to a finitely generated 
free group that is injective on X. 

limit groups can also be defined as those whose 
first-order logic [IV.23 §1] resembles that of a free 
group in a precise sense. To see how first-order logic 
can be used to say something nontrivial about a group, 
consider the sentence 

Vx,y, z 

( xy * yx) v (yz •* zy ) v (xz = zx) v (y = 1). 
A group with this property is commutative transitive: if 
x commutes with y =t= 1 , and y commutes with z, then 
x commutes with z. Free groups and Abelian groups 
have this property but a direct product of non-Abelian 
free groups, for example, does not. 

It is a simple exercise to show that free Abelian 
groups are limit groups. But if one restricts attention 
to groups that have precisely the same first-order logic 
as free groups, one gets a smaller class consisting only 
of hyperbolic groups. The groups in this class are the 
subject of intense serutiny at the moment. They all have 
negatively curved two-dimensional classifying spaces, 
built from graphs and hyperbolic surfaces in a hierar- 
chical manner. The fundamental groups Z g of closed 
surfaces of genus g ^ 2 lie in this class, lending sub- 
stance to the traditional opinion in combinatorial group 
theory that, among nonfree groups, it is the groups S g 
that resemble free groups F n most closely. 

Incorporating this opinion into our earlier discus- 
sion, we arrive at the view that the groups z”, the 
free groups F n , and the groups Z g are the most basic 
of infinite groups. This is the start of a rich vein of 
ideas involving the automorphisms of these groups. 
In particular, there are many striking parallels between 
their outer automorphism groups GL n (z), Out(F ra ), and 
Mod 5 s 0ut(A 5 ) (the mapping class group). These 
three classes of groups play a fundamental role across 
a broad Spectrum of mathematics. I have mentioned 
them here in order to make the point that, beyond the 
search for knowledge about natural classes of groups, 
there are certain “gems” in group theory that merit a 
deep and penetrating study in their own right. Other 
groups that people might suggest for this category 
include Coxeter groups (generalized reflection groups, 
for which Ia is a prototype) and Artin groups (particu- 
larly braid groups [III.4], which again crop up in many 
branches of mathematics). 

I have thrown classes of groups at you thick and 
fast in this last section. Even so, there are many fas- 
cinating classes of groups and important issues that 
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I have ignored completely. But so it must be, for as 
Higman’s theorem assures us, the challenges, joys, and 
frustrations of finitely presented groups can never be 
exhausted. 
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IV. 1 1 Harmonic Analysis 

Terence Tao 


1 Introduction 

Much of analysis tends to revolve around the study of 
general classes of functions [1.2 §2.2] and operators 
[III.52]. The functions are often real-valued or complex- 
valued, but may take values in other sets, such as a 
vector space [1.3 §2.3] or a manifold [1.3 §6.9]. An 
operator is itself a function, but at a “second level,” 
because its domain and range are themselves spaces of 
functions: that is, an operator takes a function (or per- 
haps more than one function) as its input and returns a 
transformed function as its output. Harmonic analysis 
focuses in particular on the quantitative properties of 
such functions, and how these quantitative properties 
change when various operators are applied to them. 1 

What is a “quantitative property” of a function? Here 
are two important examples. First, a function is said 
to be uniformly bounded if there is some real number 
M such that |/(x)| ^ M for every x. It can often be 
useful to know that two functions / and g are “uni- 
formly close,” which means that their difference f - g 


1. Strictly speaking, this sentence describes the Held of real-vari- 

able harmonic analysis. There is another Held called abstract harmonic 
analysis, which is primarily concerned with how real- or complex- 
valued functions (often on very general domains) can be studied using 
symmetries such as translations or rotations (for instance, via the 
Fourier transform and its relatives); this field is of course related to 
real-variable harmonic analysis, but is perhaps doser in spirit to rep- 
resentation theory and functional analysis, and will not be discussed 


is uniformly bounded with a small bound M. Second, 
a function is called square integrable if the integral 
J l/(x)| 2 dx is finite. The square integrable functions 
are important because they can be analyzed using the 
theory of hilbert spaces [III. 3 7]. 

A typical question in harmonic analysis might then 
be the following: if a function / : R™ — ■ E is square 
integrable, its gradient V/ exists, and all the n compo- 
nents of V/ are also square integrable, does this imply 
that / is uniformly bounded? (The answer is yes when 
n = 1, and no, but only just, when n = 2; this is a spe- 
cial case of the Sobolev embedding theorem, which is 
of fundamental importance in the analysis of partial 
differential equations [IV. 12].) If so, what are the 
precise bounds one can obtain? That is, given the inte- 
grals of |/| 2 and | (V/)j| 2 , what can you say about the 
uniform bound M that you obtain for /? 

Real and complex functions are of course very famil- 
iar in mathematics, and one meets them in high school. 
In many cases one deals primarily with special func- 
tions [III.87]: polynomials, exponentials, trigonomet- 
ric functions, and other very concrete and explicitly 
defined functions. Such functions typically have a very 
rich algebraic and geometric structure, and many ques- 
tions about them can be answered exactly using tech- 
niques from algebra and geometry. 

However, in many mathematical contexts one has to 
deal with functions that are not given by an explicit 
formula. For example, the solutions to ordinary and 
partial differential equations often cannot be given in 
an explicit algebraic form (as a composition of famil- 
iar functions such as polynomials, exponential func- 
tions [IH. 2 5], and trigonometric functions [III.94]). 
In such cases, how does one think about a function? 
The answer is to focus on its properties and see what 
can be deduced from them: even if the solution of a 
differential equation cannot be described by a useful 
formula, one may well be able to establish certain basic 
facts about it and be able to derive interesting conse- 
quences from those facts. Some examples of proper- 
ties that one might look at are measurability, bound- 
edness, continuity, differentiability, smoothness, ana- 
lyticity, integrability, or quick decay at infmity. One is 
thus led to consider interesting general classes of func- 
tions: to form such a class one chooses a property and 
takes the set of all functions with that property. Gen- 
erally speaking, analysis is much more concerned with 
these general classes of functions than with individual 
functions. (See also function spaces [III.29].) 
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This approach can in faet be useful even when one 
is analyzing a single funetion that is very structured 
and has an explicit formula. It is not always easy, or 
even possible, to exploit this structure and formula in 
a purely algebraic manner, and then one must rely (at 
least in part) on more analytical tools instead. A typical 
example is the Airy funetion 

Ai(x) = | e Kxg+? 3 ) dg 

Although this is defined explicitly as a certain integral, 
if one wants to answer such basic questions as whether 
Ai(x) is always a convergent integral, and whether this 
integral goes to zero as x — ±oo, it is easiest to proceed 
using the tools of harmonic analysis. In this case, one 
can use a technique known as the principle of station- 
ary phase to answer both these questions affirmatively, 
although there is the rather surprising faet that the Airy 
funetion decays almost exponentially fast as x — +°o, 
but only polynomially fast as x — - oo. 

Harmonic analysis, as a subfield of analysis, is par- 
ticularly concerned not just with qualitative proper- 
ties like the ones mentioned earlier, but also with 
quantitative bounds that relate to those properties. For 
instance, instead of merely knowing that a funetion 
/ is bounded, one may wish to know how bounded 
it is. That is, what is the smallest M ^ 0 such that 
|/(x)[ < M for all (or almost all) x g R; this num- 
ber is known as the sup norm or L°°-norm of /, and 
is denoted ||/||i». Or instead of assuming that / is 
square integrable one can quantify this by introducing 
the L 2 -norm \\f\\ L 2 = (J |/(x)| 2 dx) 1/2 ; more generally 
one can quantify pth-power integrability for 0 < p < oo 
via the U-norm WfWv = (J l/(x)| p dx) 1/p . Similarly, 
most of the other qualitative properties mentioned 
above can be quantified by a variety of norms [III.64], 
which assign a nonnegative number (or +oo) to any 
given funetion and which provide some quantitative 
measure of one characteristic of that funetion. Besides 
being of importance in pure harmonic analysis, quanti- 
tative estimates involving these norms are also useful 
in applied mathematics, for instance in performing an 
error analysis of some numerical algorithm. 

Functions tend to have infinitely many degrees of 
freedom, and it is thus unsurprising that the number 
of norms one can place on a funetion is infinite as well: 
there are many ways of quantifying how “large” a fune- 
tion is. These norms can often differ quite dramatically 
from each other. For instance, if a funetion / is very 
large for just a few values, so that its graph has tall, 
thin “spikes,” then it will have a very large I“-norm, 


but J |/(x)| dx, its V -norm, may well be quite small. 
Conversely, if / has a very broad and spread-out graph, 
then it is possible for J |/(x) | dx to be very large even 
if |/(x)| is small for every x: such a funetion has a 
large I 1 -norm but a small I “-norm. Similar example s 
can be constructed to show that the i 2 -norm some- 
times behaves very differently from either the I 1 -norm 
or the i “-norm. However, it turns out that the I 2 -norm 
lies “between” these two norms, in the sense that if one 
Controls both the L 1 -norm and the I”-norm, then one 
also automatically Controls the L 2 -norm. Intuitively, the 
reason is that if the I “-norm is not too large then one 
eliminates all the spiky functions, and if the I 1 -norm is 
small then one eliminates most of the broad functions; 
the remaining functions end up being well-behaved in 
the intermediate I 2 -norm. More quantitatively, we have 
the inequality 

ii/fo < ii/ii!i / 2 ii/ii^ 2 , 

which follows easily from the trivial algebraic faet that 
if l/(x)| ^ M, then |/(x)| 2 ^ M|/(x)|. This inequality 
is a special case of holder’s inequality [V.22], which 
is one of the fundamental inequalities in harmonic 
analysis. The idea that Control of two “extreme” norms 
automatically implies further control on “intermedi- 
ate” norms can be generalized tremendously and leads 
to very powerful and convenient methods known as 
interpolation, which is another basic tool in this area. 

The study of a single funetion and all its norms 
eventually gets somewhat tiresome, though. Nearly all 
helds of mathematics become a lot more interesting 
when one considers not just objects, but also maps 
between objects. In our case, the objects in question 
are functions, and, as was mentioned in the introduc- 
tion, a map that takes functions to functions is usually 
referred to as an operator. (In some contexts it is also 
called a transform [III.93].) Operators may seem like 
fairly complicated mathematical objects— their inputs 
and outputs are functions, which in turn have inputs 
and outputs that are usually numbers— but they are in 
faet a very natural concept since there are many sit- 
uations where one wants to transform functions. For 
example, differentiation can be thought of as an oper- 
ator, which takes a funetion / to its derivative d//dx. 
This operator has a well-known (partial) inverse, inte- 
gration, which takes / to the funetion F that is defined 
by the formula 

FM = j X ^f(y)dy. 
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A less intuitive, but particularly important, example 
is the fourier transform [III. 2 7]. This takes / to a 
function /, given by the formula 

fix) = | e~ 2nix y f (y) dy. 

It is also of interest to consider operators that take 
two or more inputs. Two particularly common exam- 
ples are the pointwise product and convolution. If / and 
g are two functions, then their pointwise product f g 
is defined in the obvious way: 

lfg)lx) = f(x)g{x). 

The convolution, denoted f * g, is defined as follows: 
f*glx) = j f(y)g(x-y)dy. 

This is just a very small sample of interesting opera- 
tors that one might look at. The original purpose of 
harmonic analysis was to understand the operators 
that were connected to Fourier analysis, real analysis, 
and complex analysis. Nowadays, however, the subject 
has grown considerably, and the methods of harmonic 
analysis have been brought to bear on a much broader 
set of operators. For example, they have been partic- 
ularly fruitful in understanding the solutions of vari- 
ous linear and nonlinear partial differential equations, 
since the solution of any such equation can be viewed 
as an operator applied to the initial conditions. They 
are also very useful in analytic and combinatorial num- 
ber theory, when one is faced with understanding the 
oscillation present in various expressions such as expo- 
nential sums. Harmonic analysis has also been applied 
to analyze operators that arise in geometric measure 
theory, probability theory, ergodic theory, numerical 
analysis, and differential geometry. 

A primary concern of harmonic analysis is to obtain 
both qualitative and quantitative information about 
the effects of these operators on generic functions. 
A typical example of a quantitative estimate is the 
inequality 

II/* ølli- ^ H/M løll% 

which is true for all f,g e I 2 . This result, which is a 
special case of Young’s inequality, is easy to prove: one 
just writes out the definition of / * g(x) and applies 
the cauchy-schwarz inequality [V.22], As a conse- 
quence, one can draw the qualitative conclusion that 
the convolution of two functions in I2 is always con- 
tinuous. Let us briefly sketch the argument, since it is 
an instructive one. 

A fundamental faet about functions in L 2 is that any 
such function / can be approximated arbitrarily well 


(in the I 2 -norm) by a function / that is continuous and 
compactly supported. (The second conditionmeans that 
f takes the value zero everywhere outside some inter- 
val [-Af, Af].) Given any two functions / and g in L 2 , let 
/ and g be approximations of this kind. It is an exercise 
in real analysis to prove that f * g is continuous, and 
it follows easily from the inequality above that f * g is 
close to / * g in the I “-norm, since 

f*g-f*g = f*(g-g) + (f-f)*g. 
Therefore, f * g can be approximated arbitrarily well 
in the L“-norm by continuous functions. A standard 
result in basic real analysis (that a uniform limit of con- 
tinuous functions is continuous) now tells us that f * g 
is continuous. 

Notice the general structure of this argument, which 
occurs frequently in harmonic analysis. First, one iden- 
tifies a “simple” class of functions for which one can 
easily prove the result one wants. Next, one proves that 
every function in a much wider class can be approxi- 
mated in a suitable sense by simple functions. Finally, 
one uses this information to deduce that the result 
holds for functions in the wider class as well. In our 
case, the simple functions were the continuous func- 
tions of finite support, the wider class consisted of 
square-integrable functions, and the suitable sense of 
approximation was closeness in the L 2 -norm. 

We shall give some further examples of qualita- 
tive and quantitative analysis of operators in the next 
section. 

2 Example: Fourier Summation 

To illustrate the interplay between quantitative and 
qualitative results, we shall now sketch some of the 
basic theory of summation of Fourier series, which his- 
torically was one of the main motivations for studying 
harmonic analysis. 

In this section, we shall consider functions / that are 
periodic with period 2tt: that is, functions such that 
f(x + 2tt) = /(x) for all x. An example of such a func- 
tion is f{x) = 3 + sin(x) - 2 cos(3x). A function like 
this, which can be written as a finite linear combina- 
tion of functions of the form sin(nx) and cos(nx), is 
called a trigonometric polynomial. The word “polyno- 
mial” is used here because any such function can be 
expressed as a polynomial in sin(x) and cos(x), or 
alternatively, and somewhat more conveniently, as a 
polynomial in e ix and e~ ix . That is, it can be written 
as X n=-N c n£ inx for some N and some choice of coef- 
ficients (c n : -N ^ n < N). If we know that / can 
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be expressed in this form, then we can work out the 
coefficient c„ quite easily: it is given by the formula 

Cn = h\7 nx) ^ nXAx - 

It is a remarkable and very important faet that we 
can say something similar about a mueh wider class of 
funetions— if, that is, we now allow infinite linear com- 
binations. Suppose that / is a periodic funetion that is 
also continuous (or, more generally, that / is absolutely 
integrable, meaning that the integral of | fix) | between 
0 and 2tt is finite). We can then define the Fourier coef- 
ficients fin) of /, using exaetly the formula we had 
above for c n : 

/(n) — f f(x)e~ inx dx. 

2tt Jo 

The example of trigonometric polynomials now sug- 
gests that one should have the identity 

fix) = X fin)e inx , 

expressing f as a sort of “infinite trigonometric poly- 
nomial,” but this is not always true, and even when it is 
true it takes some effort to justify it rigorously, or even 
to say precisely what the infinite sum means. 

To make the question more precise, let us introduce 
for each natural number N the Dirichlet summation 
operator Sy ■ This takes a funetion / to the funetion 
Sy/ that is defined by the formula 

N 

Sn/M = X f(n)é nx . 

n=-N 

The question we would like to answer is whether Sy/ 
converges to / as N — oo. The answer turns out to 
be surprisingly complicated: not only does it depend 
on the assumptions that one places on the funetion /, 
but it also depends critically on how one defines “con- 
vergence.” For example, if we assume that / is con- 
tinuous and ask for the convergence to be uniform, 
then the answer is very definitely no: there are exam- 
ples of continuous funetions / for which S\;f does not 
even converge pointwise to /. However, if we ask for 
a weaker form of convergence, the answer is yes: Sy/ 
will necessarily converge to / in the L p topology for 
any 0 < p < oo, and even though it does not have 
to converge pointwise, it will converge almost every- 
where, meaning that the set of x for which 5y/(x) does 
not converge to x has measure [III. 5 7] zero. If instead 
one assumes only that / is absolutely integrable, then 
it is possible for the partial sums S\;f to diverge at 
every single point x, as well as being divergent in the 


L p topology for every p such that 0 < p < oo. The 
proofs of all of these results ultimately rely on very 
quantitative results in harmonic analysis, and in par- 
ticular on various I p -type estimates on the Dirichlet 
sum Sy/(x), as well as estimates connected with the 
closely related maximal operator, which takes / to the 
funetion sup v>0 |Sy/(x)|. 

As these results are a little tricky to prove, let us first 
discuss a simpler result, in which the Dirichlet summa- 
tion operators Sy are replaced by the Fejer summation 
operators Fn . For each N, the operator Fy is the aver- 
age of the first N Dirichlet operators: that is, it is given 
by the formula 

F n = ^(S 0 +- ■■+Sy_i). 

It is not hard to show that if Sy/ converges to /, then 
so does Fy/. However, by averaging the Sy/ we allow 
cancellations to take place that sometimes make it pos- 
sible for Fjv/ to converge to / even when Sy / does not. 
Indeed, here is a sketch of a proof that Fy/ converges 
to / whenever / is continuous and periodic — which, as 
we have seen, is far from true of Sy/. 

In its basic structure, the argument is similar to the 
one we used when showing that the convolution of 
two funetions in L 2 is continuous. Note first that the 
result is easy to prove when / is a trigonometric poly- 
nomial, since then Sy/ = / for every N from some 
point onward. Now the Weierstrass approximation the- 
orem says that every continuous periodic funetion / 
can be uniformly approximated by trigonometric poly- 
nomials: that is, for every £ > 0 there is a trigonomet- 
ric polynomial such that \\f - g\\i^ is| £■ We know that 
Fy# is close to g for large N (since g is a trigonometric 
polynomial), and would like to deduce the same for /. 

The first step is to use some routine trigonometric 
manipulation to prove the identity 


Fy/(X) = 


sinjli jvy) 

Nsin 2 (?y) 


fix -y) dy. 


than two properties of the funetion 


that we shall use. One is that u(y) is always nonneg- 
ative and the other is that u(y) dy = 1. These two 
facts allow us to say that 


Fyh(x) = J u(y)hix-y)dy 

< Mi- j^u(y)dy = mi°°- 
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That is, HFjvhlli- < ||h||i« for any bounded function h. 

To apply this result, we choose a trigonometric poly- 
nomial g such that \\f - g ||i» ^ s and let h = f - g. 
Then we find that \\F N h\\ L - = \\F N f - F N g\\ L - ^ f as 
well. As mentioned above, if we choose N large enough, 
then ||-F]vg - 5 Ilt« sc f, and then we use the triangle 
inequality [V.22] to say that 

P 

sS \\F N f - F N g\\ L . + \\F N g-g\\ L . + \\g-f\\ L -. 
Since each term on the right-hand side is at most £, 
this shows that \\Fxf - /||i» is at most 3e. And since 
e can be made arbitrarily small, this shows that F,v/ 
converges to /. 

A similar argument (using Minkowski’s integral 
inequality [V.22] instead of the triangle inequality) 
shows that \\F N f\\ LP < ||/||i P for all 1 < p < oo, 
/ e L p , and N ^ 1. As a consequence, one can mod- 
ify the above argument to show that Fat/ converges 
to f in the L p topology for every / e L p . A slightly 
more difficult result (relying on a basic result in har- 
monic analysis known as the Hardy-Littlewood maxi- 
mal inequality) asserts that, for every 1 < p < oo, there 
exists a constant C p such that one has the inequality 
II sup N |F N /||| LP < CpWfWu, for all / e L p - as a conse- 
quence, one can show that F N f converges to / almost 
everywhere for every / e L p and 1 < p < oo. A 
slight modification of this argument also allows one 
to treat the endpoint case when / is merely assumed 
to be absolutely integrable; see the discussion on the 
Hardy-Littlewood maximal inequality at the end of this 
article. 

Now let us return briefly to Dirichlet summation. 
Using fairly sophisticated techniques in harmonic 
analysis (such as Calderon-Zygmund theory) one can 
show that when 1 < p < oo the Dirichlet operators Sn 
are bounded in L p uniformly in N. In other words, for 
every p in this range there exists a positive real num- 
ber C p such that ||Sjv/||ip ^ C p \\f\\ L P for every func- 
tion f in L p and every nonnegative integer N. As a con- 
sequence, one can show that Sn f converges to / in 
the L p topology for all / in L p and every p such that 
1 < p < oo. However, the quantitative estimate on Sn 
fails at the endpoints p = 1 and p = oo, and from this 
one can also show that the convergence result also fails 
at these endpoints (either by explicitly constructing a 
counterexample or by using general results such as the 
so-called uniform boundedness principlé). 

What happens if we ask for Sn f to converge to 
/ almost everywhere? Almost-everywhere convergence 


does not follow from convergence in L p when p < 
oo, so we cannot use the above results to prove it. It 
turns out to be a much harder question, and was a 
famous open problem, eventually answered by car- 
leson’s theorem [V.5] and an extension of it by Hunt. 
Carleson proved that one has an estimate of the form 
II sup N I Sat/ I II ip < C p ||/||ip in the case p = 2, and Hunt 
generalized the proof to cover all p with 1 < p < oo. 
This result implies that the Dirichlet sums of an L p 
function do indeed converge almost everywhere when 
1 < p ^ oo . On the other hånd, this estimate fails at the 
endpoint p = 1 , and there is in faet an example due to 
kolmogorov [VI.88] of an absolutely integrable func- 
tion whose Dirichlet sums are everywhere divergent. 
These results require a lot of harmonic analysis theory. 
In particular they use many decompositions of both 
the spatial variable and the frequency variable, keep- 
ing the Heisenberg uncertainty principlé in mind. They 
then carefully reassemble the pieces, exploiting various 
manifestations of orthogonality. 

To summarize, quantitative estimates such as L p 
estimates on various operators provide an important 
route to establishing qualitative results, such as con- 
vergence of certain series or sequences. In faet there are 
a number of principles (notably the uniform bounded- 
ness principlé and a result known as Stein’s maximal 
principlé) which assert that in certain circumstances 
this is the only route, in the sense that a quantitative 
estimate must exist in order for the qualitative result 
to be true. 

3 Some General Themes in Harmonic Analysis: 

Decomposition, Oscillation, and Geometry 

One feature of harmonic analysis methods is that they 
tend to be local rather than global. For instance, if one 
is analyzing a function / it is quite common to decom- 
pose it as a sum f = fi + - ■ - +fk, with each function f\ 
“localized” in the sense that its support (the set of val- 
ues x for which fi(x) £ 0) has a small diameter. This 
would be called localization in the spatial variable. One 
can also localize in the frequency variable by applying 
the process to the Fourier transform / of /. Having 
split / up like this, one can carry out estimates for 
the pieces separately and then recombine them later. 
One reason for this “divide and conquer” strategy is 
that a typical function / tends to have many differ- 
ent features — for example, it may be very “spiky,” “dis- 
continuous,” or “high frequency” in some places, and 
“smooth” or “low frequency” in others— and it is diffi- 
cult to treat all of these features at once. A well-chosen 
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decomposition of the function / can isolate these fea- 
tures from each other, so that each component has only 
one salient feature that could cause difficulty: the spiky 
part can go into one fu the high-frequency part into 
another, and so on. In reassembling the estimates from 
the individual components, one can use crude tools 
such as the triangle inequality or more refined tools, 
for instance those relying on some sort of orthogonal- 
ity, or perhaps a elever algorithm that groups the com- 
ponents into manageable clusters. The main drawback 
of the decomposition method (other than an aesthetic 
one) is that it tends to give bounds that are not quite 
optimal; however, in many cases one is content with an 
estimate that differs from the hest possible one by a 
multiplicative constant. 

To give a simple example of the method of decompo- 
sition, let us consider the Fourier transform /(§) of a 
function / : R — C, defined (for suitably nice funetions 
/) by the formula 

/(§) j /(x)e _27Tix? dx. 

What we can say about the size of /, as measured by 
suitable norms, if we are given information about the 
size of /, as measured by other norms? 

Here are two simple observations in response to this 
question. First, since the modulus of e~ 2mx ^ is always 
equal to 1, it follows that |/(g) | is at most J E \f(x) \ dx. 
This tells us that ||/||i» ^ ll/llii, at least if / e I 1 . In 
particular, /el”. Secondly, the Plancherel theorem, a 
very basic faet of Fourier analysis, tells us that II/II12 is 
equal to II/II12 if / e I 2 . Therefore, if / belongs to I 2 
then so does /. 

We would now like to know what happens if / lies in 
an intermediate L p space. In other words, what happens 
if 1 < p < 2? Since L p is not contained in either I 1 
or I 2 , one cannot use either of the above two results 
direetly. However, let us take a function / g I p and 
consider what the difficulty is. The reason / may not 
lie in L 1 is that it may decay too slowly: for instance, 
the function f(x) = (1 + |x|) _3/4 tends to zero more 
slowly than 1/x as x -> oo, so its integral is infinite. 
However, if we raise / to the power 3/2 we obtain the 
function (1 + |x|) -9/8 which decays quickly enough to 
have a finite integral, so / does belong to L 3/2 . Similar 
examples show that the reason / may fail to belong to 
L 2 is that it can have places where it tends to infinity 
slowly enough for the integral of \f\ p to be finite but 
not slowly enough for the integral of |/| 2 to be finite. 

Notice that these two reasons are completely differ- 
ent. Therefore, we can try to decompose / into two 


pieces, one consisting of the part where / is large and 
the other consisting of the part where / is small. That 
is, we can choose some threshold A and define fi (x) to 
be f(x) when |/(x)| < A and 0 otherwise, and define 
fi(x) to be f(x) when |/(x)| > A and 0 otherwise. 
Then fi+f2 = /, and fi and /2 are the “small part” 
and “large part” of /, respectively. 

Because l/i (x) | < A for every x, we find that 
l/i(x)| 2 = |/i(x)| 2 -P|/t(x)|P < A 2-p |/i(x)| p . 
Therefore, /1 belongs to L 2 and II/1II12 s; A 2 ~ p ||/, \\ LP . 
Similarly, because |/2<x)| > A whenever /2(x) / 0, we 
have the inequality |/2<x)| < l/2(x)| p /A p_1 for every 
x, which tells us that /? belongs to I 1 and that H/2 II11 ^ 
II/2 Hip / A p_1 . 

From our knowledge about the I 2 -norm of f\ and 
the L 1 -norm of /2 we can obtain upper bounds for 
the I 2 -norm of f\ and the L”-norm of /2, by our 
remarks above. By using this strategy for every A and 
combining the results in a elever way, one can obtain 
the Hausdorff-Young inequality, which is the following 
assertion. Let p lie between 1 and 2 and let p' be the 
dual exponent of p, which is the number p Up - 1). 
Then there is a constant C p such that, for every func- 
tion / g I p , one has the inequality \\f\\ LP ' < C p \\f\\iP. 
The particular decomposition method we have used to 
obtain this result is formally known as the method of 
real interpolation. It does not give the hest possible 
value of C p , which turns out to be p 1/2p l(p') 1/2p ' , but 
that requires more delicate methods. 

Another basic theme in harmonic analysis is the 
attempt to quantify the elusive phenomenon of oscilla- 
tion. Intuitively, if an expression oscillates wildly, then 
we expect its average value to be relatively small in 
magnitude, since the positive and negative parts, or in 
the complex case the parts with a wide range of differ- 
ent arguments, will cancel out. For instance, if a 2tt- 
periodic function f is smooth, then for large n the 
Fourier coefficient 

f(n) = 2“ \_ jT f(x)e- lnx 

will be very small since e~ inx = 0 and the com- 
paratively slow variation in /(x) is not enough to 
stop the cancellation occurring. This assertion can eas- 
ily be proved rigorously by repeated integration by 
parts. Gener alizations of this phenomenon include the 
so-called principle of stationary phase, which among 
other things allows one to obtain precise Control on 
the Airy function Ai(x) discussed earlier. It also yields 
the Heisenberg uncertainty principle, which relates the 
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decay and smoothness of a function to the decay and 
smoothness of its Fourier transform. 

A somewhat different manifestation of oscillation 
lies in the principle that if one has a sequence of func- 
tions that oscillate in different ways, then their sum 
should be significantly smaller than the bound that 
follows from the triangle inequality. Again, this is the 
result of cancellation that is simply not noticed by the 
triangle inequality. For instance, the Plancherel theo- 
rem in Fourier analysis implies, among other things, 
that a trigonometric polynomial Xn=-,v c n é nx has an 
L 2 -norm of 



This bound (which can also be proved by direct calcu- 
lation) is smaller than the upper bound of X n=-N I c n I 
that would be obtained if we simply applied the trian- 
gle inequality to the functions c n e inx . This identity can 
be viewed as a special case of the Pythagorean theo- 
rem, together with the observation that the harmonics 
e mx are all orthogonal to each other with respect to the 
INNER PRODUCT [III. 3 7] 

1 r 2n 

(f,g) = fMgix) dx. 

2t t Jo 

This concept of orthogonality has been generalized in 
a number of ways. For instance, there is a more general 
and robust concept of “almost orthogonality,” which 
roughly speaking means that the irmer products of a 
collection of functions are small but not necessarily 0. 

Many arguments in harmonic analysis will, at some 
point, involve a combinatorial statement about certain 
types of geometric objects such as cubes, halls, or 
boxes. For instance, one useful such statement is the 
Vitali covering lemma, which asserts that, given any col- 
lection Bi, . . . ,Bfc of halls in Euclidean space R n , there 
the baiis will be a subcollection Bu Bi of halls that are dis- 

disjoint, 1 m 

coiiection, joint, but that nevertheless contain a significant frac- 
tion of the volume covered by the original halls. To be 
precise, one can choose the disjoint halls so that 

vol ( U Bi^j ^ 5 _n vol ( (J B ( ). 

(The constant 5 -n can be improved, but this will not 
concern us here.) This result is obtained by a “greedy 
algorithm”: one picks halls one by one, at each stage 
choosing the largest ball among the Bj that is disjoint 
from all the halls already selected. 

One consequence of the Vitali covering lemma is 
the Hardy-Littlewood maximal inequality, which we will 


briefly describe. Given any function f g L> (R."j, any 
x g R n , and any r > 0, we can calculate the average 
of |/| in the n-dimensional sphere B(x,r) of center x 
and radius r. Next, we can define the maximal function 
F of / by letting F(x) be the largest of all these aver- 
ages as r ranges over all positive real numbers. (More 
precisely, one takes the supremum.) Then, for each pos- 
itive real number A one can define a set X\ to be the set 
of all x such that F(x) > A. The Flardy-Littlewood max- 
imal inequality asserts that the volume of X\ is at most 
5"ll/llii/A. 2 

To prove it, one observes that X\ can be covered by 
halls B(x,r ) on each of which the integral of |/| is at 
least Avol(B(x,r)). To this collection of halls one can 
then apply the Vitali covering lemma, and the result 
follows. The Hardy-Littlewood maximal inequality is 
a quantitative result, but it has as a qualitative con- 
sequence the Lebesgue differentiation theorem, which 
asserts the following. If / is any absolutely integrable 
function defined on R n , then for almost every x g R” 
the averages 

vol(B(x,r)) l B (x,r) fiy)dy 
of / over the Euclidean balis about x tend to fix) 
as r - 0. This example demonstrates the impor- 
tance of the underlying geometry (in this case, the 
combinatorics of metric balis) in harmonic analysis. 
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IV. 12 Partial Differential Equations 

Sergiu Klainerman 


Introduction 

Partial differential equations (or PDEs) are an impor- 
tant class of functional equations: they are equations, 
or systems of equations, in which the unknowns are 


2. This version of the Hardy-Littlewood inequality looks somewhat 
different from the one mentioned briefly in the previous section, but 
one can deduce that inequality from this one by the real interpolation 
method discussed earlier. 
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functions of more than one variable. As a very crude 
analogy, PDEs are to functions as polynomial equa- 
tions (such as x 2 + y 2 = 1, for example) are to num- 
bers. The distinguishing feature of PDEs, as opposed to 
more general functional equations, is that they involve 
not only unknown functions, but also various partial 
derivatives of those functions, in algebraic combina- 
tion with each other and with other, fixed, functions. 
Other important kinds of functional equations are inte- 
gral equations, which involve various integrals of the 
unknown functions, and ordinary differential equations 
(ODEs), in which the unknown functions depend on 
only one independent variable (such as a time variable 
t) and the equation involves only ordinary derivatives 
d/dt,d 2 /dt 2 ,d 3 /dt 3 , ... of these functions. 

Given the immense scope of the subject the hest I can 
hope to do is to give a very crude perspective on some 
of the main issues and an even cruder idea of the mul- 
titude of current research directions. The difficulty one 
faces in trying to describe the subject of PDEs starts 
with its very definition. Is it a unified area of mathe- 
matics, devoted to the study of a clearly defined set 
of objects (in the way that algebraic geometry studies 
solutions of polynomial equations or topology studies 
manifolds, for example), or is it rather a collection of 
separate helds, such as general relativity, several com- 
plex variables, or hydrodynamics, each one vast in its 
own right and centered on a particular, very difhcult, 
equation or class of equations? I will attempt to argue 
below that, even though there are fundamental difficul- 
ties in formulating a general theory of PDEs, one can 
nevertheless find a remarkable unity between various 
branches of mathematics and physics that are centered 
on individual PDEs or classes of PDEs. In particular, cer- 
tain ideas and methods in PDEs have turned out to be 
extraordinarily effective across the boundaries of these 
separate helds. It is thus no surprise that the most suc- 
cessful book ever written about PDEs did not mention 
PDEs in its title: it was Methods of Mathematical Physics 
by courant [VI.83] and hilbert [VI.63]. 

As it is impossible to do full justice to such a huge 
subject in such limited space I have been forced to leave 
out many topics and relevant details; in particular, I 
have said very little about the fundamental issue of 
breakdown of solutions, and there is no discussion of 
the main open problems in PDEs. A longer and more 
detailed version of the article, which includes these 
topics, can be found at 

http://press.princeton.edu/???? 


1 Basic Definitions and Examples 

The simplest example of a PDE is the laplace equa- 
tion [1.3 §5.4] 

Au = 0. (1) 

Here, A is the Laplacian, that is, the differential operator 
that transforms functions u = u(x i,X2,Xs) dehned 
from R 3 to R according to the rule 

Au(Xi,X2,%3) 

= 9 3 U (Xi , X2 , X3 ) + 9f « (Xi , X2 , X3 ) + 9§ U (Xi , X2 , X3 ) , 
where 3i, dz, 93 are standard shorthand for the par- 
tial derivatives 9/9xi, 9/9x2, 9/9x3. (We will use this 
shorthand throughout the article.) Two other funda- 
mental examples (also described in [1.3 §5.4]) are the 
heat equation and the wave equation : 

-3 tu + kAu = O, (2) 

-dfU + c 2 Au = 0. (3) 

In each case one is asked to find a function u that 
satisfies the corresponding equations. For the Laplace 
equation u will depend on xi, X2, and X3, and for the 
other two it will depend on t as well. Observe that equa- 
tions (2) and (3) again involve the symbol A, but also pur ' bu 
partial derivatives with respect to the time variable t. contrasi 
The constants k (which is positive) and c are fixed and Laplace 
represent the rate of diffusion and the speed of light, 
respectively. However, from a mathematical point of 
view they are not important, since if u(t, Xi, X2, X3) is 
a solution of (3), for example, then v(t,Xi ,X2,X3) = 
u(t,xi/c, xz t c, X3 /c) satisfies the same equation with 
c = 1. Thus, when one is studying the equations one 
can set these constants to be 1. Both equations are 
called evolution equations because they are supposed 
to describe the change of a particular physical object 
as the time parameter t varies. Observe that (1) can be 
interpreted as a particular case of both (2) and (3): if 
u = u(t, Xi , X2 , X3 ) is a solution of either (2) or (3) that 
is independent of t, then 3 tu = 0, so u must satisfy (1). 

In all three examples mentioned above, we tacitly 
assume that the solutions we are looking for are suffi- 
ciently differentiable for the equations to make sense. 

As we shall see later, one of the important develop- 
ments in the theory of PDEs was the study of more 
refined notions of solutions, such as distributions 
[III. 18], which require only weak versions of differen- 
tiability. 

Here are some further examples of important PDEs. 

The first is the schrodinger equation [III.85], 

i dtu + kAu = 0, (4) 
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where u is a function from IR x M 3 to C. This equation 
describes the quantum evolution of a massive particle, 
k = h/2m, where ft > 0 is Planck’s constant and m is 
the mass of the particle. As with the heat equation, one 
can set k to equal 1 after a simple change of variables. 
Though the equation is formally very similar to the heat 
equation, it has very different qualitative behavior. This 
illustrates an important general point about PDEs: that 
small changes in the form of an equation can lead to 
very different properties of solutions. 

A further example is the Klein-Gordon equation 
tmr 2 \ 2 

-3 t 2 u + c 2 Au- u = 0. (5) 

This is the relativistic counterpart to the Schrodinger 
equation: the parameter m has the physical interpre- 
tation of mass and mc 2 has the physical interpreta- 
tion of rest energy (reflecting Einstein’s famous equa- 
tion E = mc 2 ). One can normalize the constants c and 
mc 2 Ih so that they both equal 1 by applying a suitable 
change of variables to time and space. 

Though all five equations mentioned above first 
appeared in connection with specific physical phenom- 
ena, such as heat transfer for (2) and propagation of 
electromagnetic waves for (3), they have, miraculously, 
a range of relevance far beyond their original applica- 
tions. In particular there is no reason to restrict their 
study to three space dimensions: it is very easy to 
generalize them to similar equations in n variables 

X\,X2,...,X n . 

All the PDEs Usted so far obey a simple but funda- 
mental property called the principle o f superposition: if 
u i and u.2 are two solutions to one of these equations, 
then any linear combination a\U\ + CI2U2 of these solu- 
tions is also a solution. In other words, the space of all 
solutions is a vector space [1.3 §2.3]. Equations that 
obey this property are known as homogeneous linear 
equations. If the space of solutions is an affine space 
(that is, a translate of a vector space) rather than a vec- 
tor space, we say that the PDE is an inhomogeneous 
linear equation ; a good example is Poisson’s equation: 

Au = /, (6) 

where / : R 3 — R is a function that is given to us and 
u : R 3 — R is the unknown function. Equations that are 
neither homogeneous linear nor inhomogeneous linear 
are known as nonlinear. The following equation, the 
minimal-surface equation [III.96 §3.1], is manifestly 


nonlinear: 


( hu 

V(l+ |3itt|2 + \3 2 u\ 2 )V 2 ) 


SSsfeg-- » 


The graphs of solutions it : R 2 — R of this equation are 
area-minimizing surfaces (like soap films). 

Equations (1), (2), (3), (4), (5) are not just linear: they 
are all examples of constant-coefficient linear equations. 
This means that they can be expressed in the form 


(8) 


where T is a differential operator that involves lin- 
ear combinations, with constant real or complex coef- 
ficients, of mixed partial derivatives of /. (Such oper- 
ators are called constant-coefficient linear differential 
operators.) For instance, in the case of the Laplace equa- 
tion (1), T is simply the Laplacian A, while for the wave 
equation (3), T is the d’Alembertian 

T = U = - 3 2 + 3 2 + 3f + øf. 

The characteristic feature of linear constant-coefficient 
operators is translation invariance. Roughly speaking, 
this means that if you translate a function u, then you 
translate Tu in the same way. More precisely, if v (x) is 
defined to be u{x - a) (so the value of u at x becomes 
the value of v at x + a; note that x and a belong to R 3 
here), then Tv(x) is equal to Tu(x - a). As a conse- 
quence of this basic faet we infer that solutions to the 
homogeneous, linear, constant-coefficient equation (8) 
are still solutions when translated. 

Since symmetries play such a fundamental role in 
PDEs we should stop for a moment to make a general 
definition. A symmetry of a PDE is any invertible opera- 
tion T :u~T(u) from funetions to funetions that pre- 
serves the space of solutions, in the sense that u solves 
the PDE if and only if T(u) solves the same PDE. A PDE 
with this property is then said to be invariant under the 
symmetry T. The symmetry T is often a linear opera- 
tion, though this does not have to be the case. The com- 
position of two symmetries is again a symmetry, as is 
the inverse of a symmetry, and so it is natural to view a 
collection of symmetries as forming a group [1.3 §2.1] 
(which is typically a finite- or infinite-dimensional lie 
GROUP [III.50 §1]). 

Because the translation group is intimately con- 
nected with the fourier transform [III. 2 7] (indeed, 
the latter can be viewed as the representation theory 
of the former), this symmetry strongly suggests that 
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Fourier analysis should be a useful tool to solve 
constant-coefficient PDEs, and this is indeed the case. 

Our basic constant-coefficient linear operators, the 
Laplacian A and the d’Alembertian □, are formally 
similar in many respects. The Laplacian is fundamen- 
tally associated with the geometry of euclidean space 
[1.3 §6.2] R 3 and the d’Alembertian is similarly associ- 
ated with the geometry of minkowski space [1.3 §6.8] 
R 1+3 . This means that the Laplacian commutes with 
all the rigid motions of the Euclidean space R 3 , while 
the d’Alembertian commutes with the corresponding 
class of Poincaré transformations of Minkowski space- 
time. In the former case this simply means that invari- 
ance applies to all transformations of R 3 that preserve 
the Euclidean distances between points. In the case 
of the wave equation, the Euclidean distance has to 
be replaced by the spacetime distance between points 
(which would be called events in the language of rela- 
tivity): if P = (t,x i,X2,x 3 ) and Q(s,yi,y2,y 3 ), then 
the distance between them is given by the formula 

d M (P,Q ) 2 

= -(t-s) 2 + (x!-yi) 2 + (x 2 -y2) 2 + (x 3 -y 3 ) 2 . 
As a consequence of this basic faet we infer that all 
solutions to the wave equation (3) are invariant under 
translations and lorentz transformations [1.3 §6.8]. 

Our other evolution equations (2) and (4) are clearly 
invariant under rotations of the space variables x = 
(x 1 ,;*: 2 ,* 3 ) e R 3 , when t is fixed. They are also 
Galilean invariant, which means, in the particular case 
of the Schrodinger equation (4), that whenever u = 
u(t,x ) is a solution so is the funetion u v (t,x ) = 
e i(x-v) e it|v| 2 (t,x-yt) for any vector v e R 3 . 

Poisson’s equation (6), on the other hånd, is an 
example of a constant-coefficient inhomogeneous linear 
equation, which means that it takes the form 

P[u] = f (9) 

for some constant-coefficient linear differential opera- 
tor P and known funetion /. To solve such an equation 
requires one to understand the invertibility or other- 
wise of the linear operator P: if it is invertible then u 
will equal 3 5-1 /, and if it is not invertible then either 
there will be no solution or there will be infinitely many 
solutions. Inhomogeneous equations are closely related 
to their homogeneous counterpart; for instance, if u i , 
u 2 both solve the inhomogeneous equation (9) with 
the same inhomogeneous term /, then their differ- 
ence Ui - u 2 solves the corresponding homogeneous 
equation (8). 


Linear homogeneous PDEs satisfy the principle of 
superposition but they do not have to be translation 
invariant. For example, suppose that we modify the 
heat equation (2) so that the coefficient k is no longer 
constant but rather an arbitrary, positive, smooth fune- 
tion of (xi,X2,X3). Such an equation models the flow 
of heat in a medium in which the rate of diffusion 
varies from point to point. The corresponding space 
of solutions is not translation invariant (which is not 
surprising as the medium in which the heat flows is 
not translation invariant). Equations like this are called 
linear equations with variable coefficients. It is more 
difficult to solve them and describe their qualitative 
features than it is for constant-coefficient equations. 
(See, for example, stochastic processes [IV.24 §5.2] 
for an approach to equations of type (2) with variable 
k.) Finally, nonlinear equations such as (7) can often 
still be written in the form (8), but the operator P is 
now a nonlinear differential operator. For instance, the 
relevant operator for (7) is given by the formula 

where |3 m| 2 = (3i u) 2 + (dzu) 2 . Operators such as these 
are clearly not linear. However, because they are ulti- 
mately constructed from algebraic operations and par- 
tial derivatives, both of which are “local” operations, 
we observe the important faet that P is at least still 
a “local” operator. More precisely, if u\ and u 3 are 
two funetions that agree on some open set D, then the 
expressions P[ui] and PVuz] also agree on this set. In 
particular, if :P[0] = 0 (as is the case in our example), 
then whenever u vanishes on a domain, P[u] will also 
vanish on that domain. 

So far we have tacitly assumed that our equations 
take place in the whole of a space such as R 3 , R + x R 3 , 
or R x R 3 . In reality one is often restricted to a fixed 
domain of that space. Thus, for example, equation (1) is 
usually studied on a bounded open domain of R 3 sub- 
ject to a specified boundary condition. Here are some 
basic examples of boundary conditions. 

Example. The Dirichlet problem for Laplace’s equation 
on an open domain offlcR 3 is the problem of finding 
a funetion u that behaves in a prescribed way on the 
boundary of D and obeys the Laplace equation inside. 

More precisely, one specifies a continuous funetion 
tto : dD —*■ R and looks for a continuous funetion u, 
defined on the closure D of D, that is twice continu- 
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ously differentiable inside D and solves the equations 
Au(x) = 0 for all x g D, 
u(x) = uo(x) forallxeøD. 

A basic result in PDEs asserts that if the domain D has 
a sufficiently smooth boundary, then there is exactly 
one solution to the problem (10) for any prescribed 
function uo on the boundary 3 D. 

Example. The Plateau problem is the problem of find- 
ing the surface of minimal total area that bounds a 
given curve. 

When the surface is the graph of a function u on 
some suitably smooth domain D, in other words a set 
of the form {( x,y,u(x,y )) : (x,y) g D}, and the 
bounding emve is the graph of a function uq over 
the boundary 3 D of D, then this problem turns out 
to be equivalent to the Dirichlet problem (10), but 
with the linear equation (1) replaced by the nonlin- 
ear equation (7). For the above equations, it is also 
often natmal to replace the Dirichlet boundary condi- 
tion u(x) = iio(x) on the boundary 3 D with another 
boundary condition, such as the Neumann boundary 
condition n(x) ■ V x u(x) = u\ (x) on 3 D, where n(x) is 
the outward normal (of unit length) to D at x. Generally 
speaking, Dirichlet boundary conditions correspond to 
“absorbing” or “fixed” barriers in physics, whereas Neu- 
mann boundary conditions correspond to “reflecting” 
or “free” barriers. 

Natural boundary conditions can also be imposed for 
our evolution equations (2)-(4). The simplest one is to 
prescribe the values of u when 1 = 0. We can think of 
this more geometrically. We are prescribing the values 
of u at each spacetime point of form (0 ,x,y,z), and 
the set of all such points is a hyperplane in R 1+3 : it is 
an example of an initial time surface. 

Example. The Cauchy problem (or initial-value prob- 
lem, sometimes abbreviated to IVP) for the heat equa- 
tion (2) asks for a solution u : R + x R 3 — ■ R on the 
spacetime domain R + x R 3 = {(t,x) : t > 0, x g R 3 }, 
which equals a prescribed function uq : R 3 -» R on the 
initial time surface {0} x R 3 = 3(R + x R 3 ). 

In other words, the Cauchy problem asks for a suf- 
ficiently smooth function u, defined on the closure 
of R + x R 3 and taking values in R, that satisfies the 
conditions 

-dtu(t,x) + kAu(t,x) = 0 ] 

for every (t,x) g R + x R 3 , l (n) 
u {0, x ) = uq ( x ) for every x g R 3 . J 


The function uo is often referred to as the initial con- 
ditions, or initial data, or just data, for the problem. 
Under suitable smoothness and decay conditions, one 
can show that this equation has exactly one solution 
u for each choice of data Uo. Interestingly, this asser- 
tion fails if one replaces the future domain l + x R 3 = 
{(t,x) : t > 0, x g R 3 } by the past domain R _ x R 3 = 
{(t,x):t < 0, x GR 3 }. 

A similar formulation of the IVP holds for the Schro- 
dinger equation (4), though in this case we can solve 
both to the past and to the future. However, in the case 
of the wave equation (3) we need to specify not just the 
initial position u( 0,x) = Uq(x) on the initial time sur- 
facet = 0, but also aninitialve/odiy 3tit(0,x) = ui(x), 
since equation (3) (unlike (2) or (4)) cannot formally 
determine 3 tu in terms of u. One can construct unique 
smooth solutions (both to the future and to the past of 
the initial hyperplane 1 = 0) to the IVP for (3) for very 
general smooth initial conditions Uo, u i. 

Many other boundary-value problems are possible. 
For instance, when analyzing the evolution of a wave 
in a bounded domain D (such as a sound wave), it is 
natural to work with the spacetime domain RxD and 
prescribe both Cauchy data (on the initial boundary 
0 x D) and Dirichlet or Neumann data (on the spatial 
boundary R x 3 D). On the other hånd, when the phys- 
ical problem under consideration is the evolution of a 
wave outside a bounded obstacle (for example, an elec- 
tromagnetic wave), one considers instead the evolution 
in R x (R 3 \ D) with a boundary condition on D. 

The choice of boundary condition and initial condi- 
tions for a given PDE is very important. For equations 
of physical interest these arise naturally from the con- 
text in which they are derived. For example, in the case 
of a vibrating string, which is described by solutions of 
the one-dimensional wave equation 3 fu - d x ii = 0 in 
the domain (a, b) xR, the initial conditions u = Uo and 
3 tu = ui at t = to amount to specifying the original 
position and velocity of the string. The boundary con- 
dition u(a) = u(b) = O is what tells us that the two 
ends of the string are fixed. 

So far we have considered just scalar equations. 
These are equations where there is only one unknown 
function u, which takes values either in the real num- 
bers R or in the complex numbers C. However, many 
important PDEs involve either multiple unknown scalar 
funetions or (equivalently) funetions that take values 
in a multidimensional vector space such as R m . In 
such cases, we say that we have a system of PDEs. An 
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important example of a system is that of the cauchy- 
RIEMANN EQUATIONS [1.3 §5.6]: 

diU2~d2Ui=0, diUi + d2ti2 = 0, (12) 

where Ui, 112 : IR 2 — ■ R are real-valued functions on the 
plane. It was observed by cauchy [VI.29] that a com- 
plex function xWx+iy) = u\ (x,y)+iii 2 (x,y) isHOLO- 
morphic [1.3 §5.6] if and only if its real and imaginary 
parts Mi, m 2 satisfy the system (12). This system can 
still be represented in the form of a constant-coefficient 
linear PDE (8), but u is now a vector (“* ), and P is 
not a scalar differential operator, but rather a matrix 
of operators ( ~ j® 2 ^ ). 

The system (12) contains two equations and two 
unknowns. This is the standard situation for a deter- 
mined system. Roughly speaking, a system is called 
overdetermined if it contains more equations than 
unknowns and underdetermined if it contains fewer 
equations than unknowns. Underdetermined equations 
typically have infinitely many solutions for any given 
set of prescribed data; conversely, overdetermined 
equations tend to have no solutions at all, unless some 
additional compatibility conditions are imposed on the 
prescribed data. 

Observe also that the Cauchy-Riemann operator P 
has the following remarkable property: 

P 2 [u] = P[P[u]] = 

Thus P can be viewed as a square root of the two- 
dimensional Laplacian A. One can define a similar type 
of square root for the Laplacian in higher dimensions 
and, more surprisingly, even for the d’Alembertian 
operator □ in R 1+3 . To achieve this we need to have 
four 4x4 complex matrices y°, y 1 , y 3 , y 4 that satisfy 
the property 

y°‘yP + yPy« = -2m^I. 

Here, I is the unit 4x4 matrix and = \ when a = 
j8 = 1, when a = /? 1, and 0 otherwise. Using 

the y matrices we can introduce the Dirac operator as 
follows. If u = (mi, M2, U3, M4) is a function in R 1+3 
with values in C 4 , then we set Du = iy a d a u. It is easy 
to check that, indeed, D 2 u = Dm. The equation 

Du = ku (13) 

is called the Dirac equation and it is associated with a 
free, massive, relativistic particle such as an electron. 

One can extend the concept of a PDE further to cover 
unknowns that are not, strictly speaking, functions 


taking values in a vector space, but are instead sec- 
tions of a vector bundle [IV.6 §5], or perhaps a map 
from one manifold [1.3 §6.9] to another; such gener- 
alized PDEs play an important role in geometry and 
modern physics. A fundamental example is given by 
the einstein field equations [IV. 13]. In the simplest, 
“vacuum,” case, they take the form 

Rie (g) = 0, (14) 

where Ric(g) is the ricci curvature [III.80] tensor of 
the spacetime manifold M = (M,g). In this case the 
spacetime metric itself is the unknown to be solved for. 
One can often reduce such equations locally to more 
traditional PDE systems by selecting a suitable choice of 
coordinates, but the task of selecting a “good” choice of 
coordinates, and working out how different choices are 
compatible with each other, is a nontrivial and impor- 
tant one. Indeed, the task of selecting a good set of 
coordinates in order to solve a PDE can end up being a 
significant PDE problem in its own right. 

PDEs are ubiquitous throughout mathematics and 
science. They provide the basic mathematical frame- 
work for some of the most important physical theo- 
ries: elasticity, hydrodynamics, electromagnetism, gen- 
eral relativity, and nonrelativistic quantum mechanics, 
for example. The more modern relativistic quantum 
held theories lead, in principle, to equations in an infi- 
nite number of unknowns, which lie beyond the scope 
of PDEs. Yet, even in that case, the basic equations pre- 
serve the locality property of PDEs. Moreover, the start- 
ing point of a quantum field theory [IV. 17 §2.1.4] is 
always a classical field theory, which is described by 
systems of PDEs. This is the case, for example, in the 
standard model of weak and strong interactions, which 
is based on the so-called Yang-Mills-Higgs field theory. 
If we also include the ordinary differential equations 
of classical mechanics, which can be viewed as one- 
dimensional PDEs, we see that essentially all of physics 
is described by differential equations. As examples of 
PDEs underlying some of our most basic physical theo- 
ries we refer to the articles that discuss the euler and 
NAVIER-STOKES EQUATIONS [III.23], THE HEAT EQUA- 
TION [III.36], THE SCHRODINGER EQUATION [III.85], and 
THE EINSTEIN EQUATIONS [IV. 13]. 

An important feature of the main PDEs is their appar- 
ent universality. Thus, for example, the wave equation, 
first introduced by d’alembert [VI.20] to describe the 
motion of a vibrating string, was later found to be 
connected with the propagation of sound and electro- 
magnetic waves. The heat equation, first introduced by 
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fourier [VI.25] to describe heat propagation, appears 
in many other situations in which dissipative effects 
play an important role. The same can be said about the 
Laplace equation, the Schrodinger equation, and many 
other basic equations. 

It is even more surprising that equations that were 
originally introduced to describe specific physical phe- 
nomena have played a fundamental role in several areas 
of mathematics that are considered to be “pure,” such 
as complex analysis, differential geometry, topology, 
and algebraic geometry. Complex analysis, for exam- 
ple, which studies the properties of holomorphic func- 
tions, can be regarded as the study of solutions to 
the Cauchy-Riemann equations (12) in a domain of R 2 . 
Hodge theory is based on studying the space of solu- 
tions to a class of linear systems of PDEs on manifolds 
that generalize the Cauchy-Riemann equations: it plays 
a fundamental role in topology and algebraic geometry. 
THE ATIYAH-SINGER INDEX THEOREM [V.2] is formu- 
lated in terms of a special class of linear PDEs on mani- 
folds, related to the Euclidean version of the Dirac oper- 
ator. Important geometric problems can be reduced to 
finding solutions to specific PDEs, typically nonlinear. 
We have already seen one example: the Plateau prob- 
lem of finding surfaces of minimal total area that pass 
through a given curve. Another striking example is the 
uniformization theorem [V.37] in the theory of sur- 
faces, which takes a compact Riemannian surface S (a 
two-dimensional surface with a riemannian metric 
[1.3 §6.10]) and, by solving the PDE 

A s u + e 2u =K (15) 

(which is a nonlinear variant of the Laplace equation 
(1)), uniformizes the metric so that it is “equally curved” 
at all points on the surface (or, more precisely, has 
constant scalar curvature [III.80]) without changing 
the conformal class of the metric (i.e., without distort- 
ing any of the angles subtended by curves on the sur- 
face). This theorem is of fundamental importance to 
the theory of such surfaces: in particular, it allows one 
to give a topological classification of compact surfaces 
in terms of a single number x(S), which is called the 
euler characteristic [1.4 §2.2] of the surface 5. The 
three-dimensional analogue of the uniformization the- 
orem, the GEOMETRIZATION CONJECTURE [IV. 7 §2.4] of 
Thurston, has recently been established by Perelman, 
who did so by solving yet another PDE; in this case, the 
equation is the ricci flow [111.80] equation 

(16) 


which can be transformed into a nonlinear version of 
the heat equation (2) after a carefully chosen change 
of coordinates. The proof of the geometrization con- 
jecture is a decisive step toward the total classifica- 
tion of all three-dimensional compact manifolds, in 
particular establishing the well-known poincaré con- 
jecture [IV. 7 §2.4]. To overcome the many technical 
details in establishing this conjecture, one needs to 
make a detailed qualitative analysis of the behavior 
of solutions to the Ricci flow equation, a task which 
requires just about all the advances made in geometric 
PDEs in the last hundred years. 

Finally, we note that PDEs arise not only in physics 
and geometry but also in many helds of applied sci- 
ence. In engineering, for example, one often wants to 
control some feature of the solution u to a PDE by care- 
fully selecting whatever components of the given data 
one can directly inhuence; consider, for instance, how 
a violinist Controls the solution to the vibrating string 
equation (closely related to (3)) by modulating the force 
and motion of a bow on that string in order to produce a 
beautiful sound. The mathematical theory dealing with 
these types of issues is called control theory. 

When dealing with complex physical systems, one 
cannot possibly have complete information about the 
State of the system at any given time. Instead, one 
often makes certain randomness assumptions about 
various factors that influence it. This leads to the very 
important class of equations called stochastic differen- 
tial equations (SDEs), where one or more components of 
the equation involve a random variable [III.73 §4] of 
some sort. An example of this is in the black-scholes 
model [VII.9 §2] in mathematical hnance. A general dis- 
cussion of SDEs canbe found in stochastic processes 
[IV.24 §6]. 

The plan for the rest of this article is as follows. In 
section 2 I shall describe some of the basic notions 
and achievements of the general theory of PDEs. The 
main point I want to make here is that, in contrast 
with ordinary differential equations, for which a gen- 
eral theory is both possible and useful, partial differen- 
tial equations do not lend themselves to a useful gen- 
eral theoretical treatment because of some important 
obstructions that I shall try to describe. One is thus 
forced to discuss special classes of equations such as 
elliptic, parabolic, hyperbolic, and dispersive equations. 
In section 3 I will try to argue that, despite the impossi- 
bility of developing a useful general theory that encom- 
passes all, or most, of the important examples, there is 
nevertheless an impressive unifying body of concepts 


d t g = 2Rictø), 
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and methods for dealing with various basic equations, 
and this gives PDEs the feel of a well-defined area of 
mathematics. In section 4 I develop this further by try- 
ing to identify some common features in the derivation 
of the main equations that are dealt with in the subject. 
An additional source of unity for PDEs is the central 
role played by the issues of regularity and breakdown 
of solutions, which is discussed only briefly here. In the 
final section we shall discuss some of the main goals 
that can be identified as driving the subject. 

2 General Equations 

One might expect, after looking at other areas of math- 
ematics such as algebraic geometry or topology, that 
there was a very general theory of PDEs that could be 
specialized to various specific cases. As I shall argue 
below, this point of view is seriously flawed and very 
much out of fashion. It does, however, have important 
merits, which I hope to illustrate in this section. I shall 
avoid giving formal definitions and focus instead on 
representative examples. The reader who wants more 
precise definitions can consult the Online version of 
this article. 

For simplicity we shall look mostly at determined 
systems of PDEs. The simplest distinction, which we 
have already made, is between scalar equations, such 
as (1)— (5), which consist of only one equation and one 
unknown, and systems of equations, such as (12) and 
(13). Another simple but important concept is that of 
the order of a PDE, which is defined to be the highest 
derivative that appears in the equation; this concept is 
analogous to that of the degree of a polynomial. For 
instance, the five basic equations (l)-(5) listed earlier 
are second order in space, although some (such as (2) 
or (4)) are only first order in time. Equations (12) and 
(13), as well as the Maxwell equations, are first order. 1 

We have seen that PDEs can be divided into linear and 
nonlinear equations, with the linear equations being 
divided further into constant-coefficient and variable- 
coefficient equations. One can also divide nonlinear 
PDEs into several further classes depending on the 
“strength” of the nonlinearity. At one end of the scale, 
a semilinear equation is one in which all the nonlinear 
components of the equation have strictly lower order 
than the linear components. For instance, equation (15) 
is semilinear, because the nonlinear component e u is 


1. There is a simple trick, well-known in ordinary differential equa- 

tions, for converting higher-order equations into a lower-order (or 
even first-order) system of equations by increasing the number of 
unknowns. See the discussion in Dynamics [IV.14 §1.2]. 


of zero order, i.e., it contains no derivatives, whereas 
the linear component Asu is of second order. These 
equations are close enough to being linear that they can 
often be effectively viewed as perturbations of a linear 
equation. A more strongly nonlinear class of equations 
is that of quasilinear equations, in which the highest- 
order derivatives of u appear in the equation only in 
a linear manner but the coefficients attached to those 
derivatives may depend in some nonlinear manner on 
lower-order derivatives. For instance, the second-order 
equation (7) is quasilinear, because if one uses the 
product rule to expand the equation, then it takes the 
quasilinear form 

Tu (3l u, 02 u) Ø 2 U + T 12 (01 tt, 02 tt) øl 02 U 

+ T 22 (01 tt, 02U)ø 2 1t = 0 

for some explicit algebraic functions Tu , Ti 2 , T 22 of the PIJP - mothe 
lower-order derivatives of u. While quasilinear equa- proofreader 
tions can still sometimes be analyzed by perturbative 
techniques, this is generally more difficult to accom- 
plish than it is for an analogous semilinear equation. 

Finally, we have fully nonlinear equations, which exhibit 
no linearity properties whatsoever. A typical example is 
the Monge-Ampere equation 

det(D 2 u) = F(x,u,Du), 

where u : R" -> R is the unknown function, Du is the 
GRADIENT [1.3 §5.3] Of U, D 2 li = (øiøjW)l^ij<n is the 
Hessian matrix of u, and T : 1" x R x 1" — R is a 
given function. This equation arises in many geometric 
contexts, ranging from manifold-embedding problems 
to the complex geometry of calabi-yau manifolds 
[III.6]. Fully nonlinear equations are among the most 
difficult and least well-understood of all PDEs. 

Remark. Most of the basic equations of physics, such 
as the Einstein equations, are quasilinear. However, 
fully nonlinear equations arise in the theory of char- 
acteristics of linear PDEs, which we discuss below, and 
also in geometry. 


It turns out that first-order scalar PDEs in any num- 
ber of dimensions can be reduced to systems of first- 
order ODEs. As a simple illustration of this impor- 
tant faet consider the following equation in two space 
dimensions: 

a 1 (x 1 ,x 2 )d 1 u(x 1 ,x 2 )+a 2 (x 1 , x 2 )0 2 u(x J , % 2 ) 

= f(x\x 2 ), (17) 


2.1 First-Order Scalar Equations 
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where a 1 , a 2 , f are given real functions in the variables 
x = ( x 1 ,x 2 ) e R 2 . Weassociatewith(17)thefirst-order 
2x2 system 


^-(5) = a 1 (x 1 (5),x 2 (5)),j 

^ =a 2 (x 1 (s),x 2 (s)).; 
ds - 


(18) 


To simplify matters, let us assume that / = 0. 

Suppose now that x (s) = (x 1 (s),x 2 (s)) is a solution 
of (18), and let us consider how u(x 1 (s),x 2 (s)) varies 
as 5 varies. By the Chain rule we know that 


d_ 

d s U 


3n 


d dx 1 
ds ds 


02 u 


dx 2 

hi’ 


and equations (17) and (18) imply that this equals zero 
(by our assumption that f = 0). In other words, any 
solution u = u(x l ,x 2 ) of (17) with / = 0 is con- 
stant along any parametrized curve of the form x(s) = 
(x 1 (i), x 2 (s)) that satisfies (18). 

Thus, in principle, if we know the solutions to (18), 
which are called characteristic curves for the equation 

(17) , then we can find all solutions to (1 7). I say “in prin- 
ciple” because, in general, the nonlinear system (18) is 
not so easy to solve. Nevertheless, ODEs are simpler to 
deal with, and the fundamental theorem of ODEs, which 
we will discuss later in this section, allows us to solve 

(18) at least locally and for a small interval in s. 

The faet that u is constant along characteristic curves 
allows us to obtain important qualitative information 
even when we cannot find explicit solutions. For exam- 
ple, suppose that the coefficients a 1 , a 2 are smooth (or 
real analytic) and that the initial data is smooth (or real 
analytic) everywhere on the set J-f where it is defined, 
except at some point Xo where it is discontinuous. Then 
the solution u remains smooth (or real analytic) at 
all points except along the characteristic curve T that 
starts at Xo, or, in other words, along the solution to 
(18) that satisfies the initial condition x(0) = xø. That 
is, the discontinuity at xo propagates precisely along 
T. We see here the simplest manifestation of an impor- 
tant principle, which we shall explain in more detail 
later: singularities of solutions to PDEs propagate along 
characteristics (or, more generally, hypersurfaces). 

One can generalize equation (17) to allow the coeffi- 
cients a\ , a-2 , and / to depend not only on x = (x 1 , x 2 ) 
but also on u: 


a 1 (x,u(x))øitt(x)+a 2 (x,u(x))d 2 u(x) = f(x,u(x)). 

(19) 


The associated characteristic system becomes 
^-(s) = a 1 (x(5),n(5,x(5))),] 
dx 2 


ds 


(s) = a 2 (x(s),w(s,x(s))). 


As a special example of (19) consider the scalar 
equation in two space dimensions, 

d t u + ud x u = 0, u(0,x) = tto(x), (21) 


which is called the Burger equation. Here we have set 
a x (x,u(x)) = 1 and a 2 (x,u(x)) = u(x). With this 
choice of a 1 , a 2 , we can take x 1 (5) to be 5 in (20). Then, 
renaming x 2 (s) as x(s), we derive the characteristic 
equation in the form 

^jj(s) =U(S,X(S)). (22) 

For any given solution u of (21) and any characteristic 
curve (s,x(s)) we have (d/ds)u(s,x(s)) = 0. Thus, in 
principle, knowing the solutions to (22) should allow us 
to determine the solutions to (21). However, this argu- 
ment seems worryingly circular, since u itself appears 
in (22). 

To see how this difficulty can be circumvented, con- 
sider the IVP for (21): that is, look for solutions that 
satisfy u( 0,x) = uo(x). Consider an associated char- 
acteristic curve x(s) such that, initially, x(0) = Xo- 
Then, since u is constant along the curve, we must have 
u(s,x(s)) = tto(xo).Hence, goingbackto(22), weinfer 
thatdx/ds = uo(xo) and thus x(s) = xo + suo(xo).We 
thus deduce that 


u(s,xo + SUo(Xo)) = Uo(Xo), (23) 

which implicitly gives us the form of the solution u. 
We see once more, from (23), that if the initial data is 
smooth (or real analytic) everywhere except at a point 
Xo of the line t = 0, then the corresponding solution 
is also smooth (or real analytic) everywhere in a small 
neighborhood V of Xo, except along the characteristic 
curve that begins at xo- The smallness of V is neces- 
sary here because new singularities can form at large 
scales. Indeed, u has to be constant along the lines 
x + suo(x), whose slopes depend on ttø(x). At a point 
where these lines cross we would obtain different val- 
ues of u, which is impossible unless u becomes singular 
by this point. This blow-up phenomenon occurs for any 
smooth, nonconstant initial data uq. 


Remark. There is an important difference between the 
linear equation (17) and the quasilinear equation (19). 
The characteristics of the first depend only on the coef- 
ficients a 1 (x), a 2 (x), while the characteristics of the 
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second depend explicitly on a particular solution u of 
the equation. In both cases, singularities can only prop- 
agate along the characteristic curves of the equation. 
For nonlinear equations, however, new singularities can 
form at large distance scales, whatever the smoothness 
of the initial data. 


The above procedure extends to fully nonlinear 
scalar equations in R d such as the Hamilton-Jacobi 
equation 

d t u + H(x,Du) = 0, u(0,x) = Uo(x), (24) 


where u : IRxR” — R is the unknown function, Da is the 
gradient of u, and the hamiltonian [III.35] H : R d x 
R d — ■ R and the initial data uø : M d R are given. For 
instance, the eikonal equation dtu = Du is a special 
instance of a Hamilton-Jacobi equation. We associate 
with (24) the ODE system 


df 

where i runs from 1 to d. The equations (25) are known 
as a Hamiltonian system of ODEs. The relationship 
between this system and the corresponding Hamilton- 
Jacobi equation is a little more involved than in the 
cases discussed above. Briefly, we can construct a solu- 
tion u to (24) based only on the knowledge of the solu- 
tions (x(t),p(t)) to (25), which are called the bichar- 
acteristic curves of the nonlinear PDE. Once again, 
singularities can only propagate along bicharacteristic 
curves (or hypersurfaces). As in the case of the Burger 
equation, singularities will occur for more or less any 
smooth data. Thus, a classical, continuously differen- 
tiable solution can only be constructed locally in time. 
Both Hamilton-Jacobi equations and Hamiltonian sys- 
tems play a fundamental role in classical mechanics as 
well as in the theory of the propagation of singularities 
in linear PDEs. The deep connection between Hamil- 
tonian systems and first-order Hamilton-Jacobi equa- 
tions played an important role in the introduction of 
the Schrodinger equation into quantum mechanics. 


2.2 The Initial-Value Problem for ODEs 

Before we can continue with our general presentation 
of PDEs we need first to discuss, for the sake of com- 
parison, the IVP for ODEs. Let us start with a first-order 
ODE 

(26) 


subject to the initial condition 

u(x o) = u 0 . (27) 

Let us also assume for simplicity that (26) is a scalar 
equation and that / is a well-behaved function of x and 
u, such as f{x, u) = u 3 - u + 1 + sin x. From the initial 
data uo we can determine d x u(xo) by substituting xq 
into (26). If we now differentiate the equation (26) with 
respect to x and apply the chain rule, we derive the 
equation 

d 2 x u{x) = d x f(x,u(x)) + d u f(x,u(x))d x u(x), 
which for the example just defined works out to be 
cosx + 3u 2 (x)d x u(x) - d x u(x). Hence, 

d x u(x o) = d x f(xo,Uo) + d u f(xo,Uo)d x Uo, 
and since d x u(x o) has already been determined we 
find that d x ti(xo) can also be explicitly calculated 
from the initial data uø. This calculation also involves 
the function / and its first partial derivatives. Taking 
higher derivatives of the equation (26) we can recur- 
sively determine 3|u(xo), as well as all other higher 
derivatives of u at xq ■ Therefore, one can in principle 
determine u(x) with the help of the Taylor series 

u(x) = X -j-d x u(x 0 )(x - x 0 ) k 
o K ' 

= u(xo) + d x u(x o) (x - xo) 

+ ^d x (xo)(x - xo) 2 + ■ ■ ■ . 

We say “in principle” because there is no guarantee 
that the series converges. There is, however, a very 
important theorem, called the Cauchy-Kowalewski the- 
orem, which asserts that if the function / is real ana- 
lytic, as is certainly the case for our function /(x, u) = 
u 3 - u + 1 + sinx, then there will be some neighbor- 
hood J of xo where the Taylor series converges to a 
real-analytic solution u of the equation. It is then easy 
to show that the solution thus obtained is the unique 
solution to (26) that satisfies the initial condition (27). 
To summarize: if / is a well-behaved function, then the 
initial-value problem for ODEs has a solution, at least 
in some time interval, and that solution is unique. 

The same result does not always hold if we consider 
a more general equation of the form 

a(x,u(x))d x u = f(x,u(x)), u(x 0 ) = u 0 . (28) 

Indeed, the recursive argument outlined above breaks 
down in the case of the scalar equation (x — xo )d x u = 
f(x,u ) for the simple reason that we cannot even 
determine d x u(x o) from the initial condition u(x o) = 


d x u(x) =/(x,u(x)) 
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Uo . A similar problem occurs for the equation (u - 
Uo)d x u = f(x,u). An obvious condition that allows 
us to extend our previous recursive argument to (28) 
is to insist that a(xo,Uo) * 0. Otherwise, we say that 
the IVP (28) is characteristic. If both a and / are also 
real analytic, the Cauchy-Kowalewski theorem applies 
again and we obtain a unique, real-analytic solution of 
(28) in a small neighborhood of Xo . In the case of an 
NxN system, 

A(x,u(x))d x u F(x,u(x)), u(x o) = uo, 

A = A(x,u) is anNxN matrix, and the noncharacter- 
istic condition becomes 

detA(xo,tto) * 0. (29) 

It turns out, and this is extremely important in the 
development of the theory of ODEs, that, while the 
nondegeneracy condition (29) is essential to obtain a 
unique solution of the equation, the analyticity con- 
dition is not at all important: it can be replaced by a 
simple local Lipschitz condition for A and F. It suffices 
to assume, for example, that their first partial deriva- 
tives exist and that they are locally bounded. This is 
always the case if the first derivatives of A and F are 
continuous. 

Theorem (the fundamental theorem of ODEs). If the 

matrix A(xo, Uo) is invertible and if A and F are con- 
tinuous and have locally bounded first derivatives, then 
thereis some time interval J c R that contains xo, and a 
unique solution 2 u defined onj that satisfies the initial 
conditions u(x o) = Uo- 

The proof of the theorem is based on the Picard der- 
ation method. The idea is to construct a sequence of 
approximate solutions U( nj (x) that converge to the 
desired solution. Without loss of generality we can 
assume A to be the identity matrix. 3 One starts by 
setting U(0) (x) = Uq and then defines, recursively, 
d x U(n)(x) = F(%, «(„_!)(%)), U(n-l)(Xo) = Uo- 
Observe that at every stage all we need to solve is a very 
simple linear problem, which makes Picard iteration 
easy to implement numerically. As we shall see below, 
variations of this method are also used for solving 
nonlinear PDEs. 

Remark. In general, the local existence theorem is 
sharp, in the sense that its conditions cannot be 


2. Since we are not assuming that A and F are analytic, the solution 
may not be analytic, but it does have continuous first derivatives. 

3. Since A is invertible we can multiply both sides of the equation 
by the inverse matrix A _1 . 


relaxed. We have seen that the invertibility condition 
for A(xo,uo) is necessary. Also, it is not always pos- 
sible to extend the interval J in which the solution 
exists to the whole of the real line. As an example, 
consider the nonlinear equation d x u = u 2 with ini- 
tial data u = uo at x = 0 , for which the solution 
u = Uo / (1 — xuo) becomes infinite in finite time: in 
the terminology of PDEs, it blows up. 

In view of the fundamental theorem and the example 
mentioned above, one can define the main goals of the 
mathematical theory of ODEs as follows. 

(i) Find criteria for global existence. In the case of 
blow-up describe the limiting behavior. 

(ii) In the case of global existence describe the asymp- 
totic behavior of solutions and families of solu- 
tions. 

Though it is impossible to develop a general theory 
that achieves both goals (in practice one is forced to 
restrict oneself to special classes of equations moti- 
vated by applications), the general local existence and 
uniqueness theorem mentioned above provides a pow- 
erful unifying theme. It would be very helpful if a 
similar situation were to hold for general PDEs. 

2.3 The Initial-Value Problem for PDEs 

In the one-dimensional situation one specifies initial 
conditions at a point. The natural higher-dimensional 
analogue is to specify them on hypersurfaces df c W d , 
that is, (d - 1 ) -dimensional subsets (or, to be more pre- 
cise, submanifolds). For a general equation of order k, 
that is, one that involves k derivatives, we need to spec- 
ify the values of u and of its first k — 1 derivatives in 
the direction normal to df. For example, in the case 
of the second-order wave equation (3) and the initial 
hyperplane t = 0 we need to specify initial data for u 
and dtu. 

If we wish to use initial data of this kind to start 
obtaining a solution, it is important that the data 
should not be degenerate. (We have already seen this 
in the case of ODEs.) For this reason, we make the 
following general definition. 

Definition. Suppose that we have a kth-order quasi- 
linear system of equations, and the initial data comes 
in the form of the first k - 1 normal derivatives that a 
solution u must satisfy on a hypersurface df. We say 
that the system is noncharacteristic at a point xo of df 
if we can use the initial data to determine formally all 
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the other higher partial derivatives of u at xo, in terms 
of the data. 


As a very rough picture to have in mind, it may be 
helpful to imagine an infinitesimally small neighbor- 
hood of Xq. If the hypersurface Tf is smooth, then its 
intersection with this neighborhood will be a piece of 
a (d - 1) -dimensional affine subspace. The values of 
u and the first k — 1 normal derivatives on this inter- 
section are given by the initial data, and the problem 
of determining the other partial derivatives is a prob- 
lem in linear algebra (because everything is infmitesi- 
mally small). To say that the system is noncharacteris- 
tic at xq is to say that this linear algebra problem can 
be uniquely solved, which is the case provided that a 
certain matrix is invertible. This is the nondegeneracy 
condition referred to earlier. 

To illustrate the idea, let us look at first-order equa- 
tions in two space dimensions. In this case Tf is a curve 
T, and since k - 1 = 0 we must specify the restriction 
of u to r c IR 2 but we do not have to worry about any 
derivatives. Thus, we are trying to solve the system 


a 1 (x,u(x))3itt(x) + a 2 (x,u(x))d 2 u(x) 

= /(x,u(x)), u\r = u 0 , (30) 


where a 1 , a 2 , and / are real-valued functions of x 
(which belongs to R 2 ) and u. Assume that in a small 
neighborhood of a point p the curve T is described 
parametrically as the set of points x = (x 1 (s),x 2 (s)j. 
We denote by n(s) = (ni (s),ri 2 (s)) a unit normal to T. 

As in the case of ODEs, which we looked at earlier, we 
would like to find conditions on T such that for a given 
point in T we can determine all derivatives of n from 
the data uo, the derivatives of u along T, and the equa- 
tion (30). Out of all possible curves T we distinguish 
in particular the characteristic ones we have already 
encountered above (see (20)): 


= a 1 (x(s),u(x(s))), 

dx 2 

— = a 2 (x(5),n(x(5))), 
One can prove the following faet: 


x(0) = p. 


resul ting matrix is singular. If you follow this direction, 
then you travel along a characteristic curve. 
Conversely, if the nondegeneracy condition 
a r (p,u(p))ni(p) + a 2 (p,u(p))n 2 (p) * 0 (31) 

is satisfied at some point p = x(0) e T, then we can 
determine all higher derivatives of u at Xo uniquely in 
terms of the data uo and its derivatives along T. If the 
curve r is given by the equation ip(x ] , x 2 ) = 0, with 
nonvanishing gradient D ip(p) =f= 0, then the condition 

(31) takes the form 

a 1 (p,M(p))3i(//(p) + a 2 (p,u(p))d 2 *p{p) * 0. 
With a little more work one can extend the above 
discussion to higher-order equations in higher dimen- 
sions, and even to systems of equations. Particularly 
important is the case of a second-order scalar equation 
in R d , 

d 

X a lJ (x)didjU f(x,u(x)), (32) 

together with a hypersurface Tf in R d defined by the 
equation ip(x) = 0, where <p is a funetion with non- 
vanishing gradient Dtp. Define the unit normal at a 
point Xo g Tf to be n = Dtp/|Dt//|, or, in compo- 
nent form, m = 3jip/|3ip|. As initial conditions for 

(32) we prescribe the values of u and its normal deriva- 
tive n[u](x) = ni(x)3iu(x) + n 2 (x)d 2 u(x) + ■■■ + 
nd(x)ddU(x) on Tf: 

u(x) = n 0 (x), n[u](x) = ui(x), xeH. 

It can be shown that Tf is noncharacteristic (with 
respect to equation (32)) at a point p (that is, we can 
determine all derivatives of u at p in terms of the initial 
data Mo, tti) if and only if 
d 

X a i Hp)dpp(p)d j ^(p) * 0. (33) 

On the other hånd, Tf is a characteristic hypersurface 
for (32) if 

d 

X a y (x)3i<p(x)3j(p(x) = 0 (34) 

iJ= i 

for every x in Tf. 


Along a characteristic curve, the equation (30) is degen- 
erate. That is, we cannot determine the first-order 
derivatives ofu uniquely in terms of the data uq. 

In terms of the rough picture above, at each point 
there is a direction such that if the hypersurface, which 
in this case is a line, is along that direction, then the 


Example. If the coefficients a of (32) satisfy the condi- 
tion 

X a« (x)&& > 0 , V§GR d , Vxel d , (35) 

i,j = 1 

then clearly, by (34), no surface in R d can be charac- 
teristic. This is the case, in particular, for the Laplace 
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equation Au = f. Consider also the minimal-surface 
equation (7) written in the form 

X h i Hdu)d i djU = 0, (36) 

i,J = l,2 

with h n (du) = 1 + (02 tt) 2 , h 22 (du) = 1 + (øiti) 2 , 
h 12 (du) = h 21 (du) = -øitt02tt. It is easy to check 
that the quadratic form associated with the syrnmetric 
matrix (du) is positive defmite for every du. Indeed, 

h l Hdu)^j 

= (1 + |Øti| 2 )- 1/2 (|§| 2 - (1 + |øn| 2 ) _1 (§ ■ du) 2 ) > 0. 
Thus, even though (36) is not linear, we see that all 
surfaces in R 2 are noncharacteristic. 

Example. Consider the wave equation Du = f in R 1 +d . 
All hypersurfaces of the form ip(t, x) = 0 for which 

(dtip) 2 = t(dpp) 2 (37) 

i=l 

are characteristic. This is the famous eikonal equation, 
which plays a fundamental role in the study of wave 
propagation. Observe that it splits into two Hamilton- 
Jacobi equations (see (24)): 

/ d % 1/2 

Ø t «// = ±(X(df/') 2 ■ (38) 

' i=l 

The bicharacteristic curves of the associated Hamiltoni- 
ans are called bicharacteristic curves of the wave equa- 
tion. As particular solutions of (37) we find ip+ ( t , x) = 
( t - to) + \x - x 0 \ and < p-(t,x) = (t - t 0 ) - \x - x 0 \, 
whose level surfaces p± = 0 correspond to forward and 
backward light cones with their vertex at p = (to, Xo). 
These represent, physically, the union of all light rays 
emanating from a point source at p. The light rays are 
given by the equation (t - to) cv = (x - Xq), for tu e R 3 
with | co | = 1, and are precisely the (t,x) components 
of the bicharacteristic curves of the Hamilton-Jacobi 
equations (38). More generally, the characteristics of 
the linear wave equation 

a 00 (t,x)d?u-Xa ii (t,x)d i d j u = 0, (39) 

with a 00 > 0 and a iJ satisfying (35), are given by the 
Hamilton-Jacobi equations: 

-a 00 (t,x)(dtip) 2 + a lj (x)dnpdjip = 0 
or, equivalently, 

/ U/2 

ø t (// = ±^(a 00 ) -1 Y j a lj (x)d i H)djil>j . (40) 

The bicharacteristics of the corresponding Hamiltonian 
systems are called bicharacteristic curves of (39). 


Remark. In the case of the first-order scalar equations 
(17) we have seen how knowledge of characteristics 
can be used to find, implicitly, general solutions. We 
have also seen that singularities propagate only along 
characteristics. In the case of second-order equations 
the characteristics are not sufficient to solve the equa- 
tions, but they continue to provide important infor- 
mation, such as how the singularities propagate. For 
example, in the case of the wave equation Du = 0 with 
smooth initial data Uo, U\ everywhere except at a point 
p = (to, Xq), the solution u has singularities present at 
all points of the light cone -(t - to) 2 + \x - xq \ 2 = 0 
with vertex at p. A more refined version of this faet 
shows that the singularities propagate along bicharac- 
teristics. The general principle here is that singularities 
propagate along characteristic hypersurfaces ofa PDE. 
Since this is a very important principle, it pays to give 
it a more precise formulation that extends to general 
boundary conditions, such as the Dirichlet condition 
for (1). 

Propagation of singularities. If the boundary condi- 
tions or the coefficients of a PDE are singular at some 
point p, and otherwise smooth (or real analytic) every- 
where in some small neighborhood V ofp, then a solu- 
tion of the equation cannot be singular in V except 
along a characteristic hypersurface passing through p. 
In particular, if there are no such characteristic hyper- 
surfaces, then any solution of the equation must be 
smooth (or real analytic) at every point of V other 

Remarks, (i) The heuristic principle mentioned above 
is invalid, in general, at large scales. Indeed, as we have 
shown in the case of the Burger equation, solutions to 
nonlinear evolution equations can develop new singu- 
larities whatever the smoothness of the initial condi- 
tions. Global versions of the principle can be formu- 
lated for linear equations based on the bicharacteristics 
of the equation. See (iii) below. 

(ii) According to the principle, it follows that any solu- 
tion of the equation Au = /, satisfying the bound- 
ary condition u\sd = Wo with a boundary value Uo that 
merely has to be continuous, is automatically smooth 
everywhere in the interior of D provided that / itself is 
smooth there. Moreover, the solution is real analytic if 
/ is real analytic. 

(iii) More precise versions of this principle, which plays 
a fundamental role in the general theory, can be given 
for linear equations. In the case of the general wave 
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equation (39), for example, one can show that singular- 
ities propagate along bicharacteristics. These are the 
bicharacteristic curves associated with the Hamilton- 
Jacobi equation (40). 

2.4 The Cauchy-Kowalewski Theorem 

In the case of ODEs we have seen that a noncharacteris- 
tic IVP always admits solutions locally (that is, in some 
time interval about a given point). Is there a higher- 
dimensional analogue of this faet? The answer is yes, 
provided that we restrict ourselves to the real-analytic 
situation, which is covered by an appropriate exten- 
sion of the Cauchy-Kowalewski theorem. More pre- 
cisely, one can consider general quasilinear equations, 
or systems, with real-analytic coefficients, real-analytic 
hypersurfaces df, and appropriate real-analytic initial 
data on df. 

Theorem (Cauchy-Kowalewski (CK)). If all the real- 
analyticity conditions made above are satisfied and if 
the initial hypersurface df is noncharacteristic at Xo, 4 
then in some neighborhood of Xq there is a unique 
real-analytic solution u(x) that satisfies the system of 
equations and the corresponding initial conditions. 

In the special case of linear equations, an important 
companion theorem, due to Holmgren, asserts that the 
analytic solution given by the CK theorem is unique in 
the class of all smooth solutions and smooth nonchar- 
acteristic hypersurfaces df . The CK theorem shows 
that, given the noncharacteristic condition and the ana- 
lyticity assumptions, the following straightforward way 
of finding solutions works: look for a formal expansion 
of the kind u(x) = Y. a C a (,x - Xo) a by determining the 
constants C« recursively from simple algebraic formu- 
las arising from the equation and initial conditions on 
df . More precisely, the theorem ensures that the naive 
expansion obtained in this way converges in a small 
neighborhood of xq É df. 

It turns out, however, that the analyticity conditions 
required by the CK theorem are mueh too restrictive, 
and therefore the apparent generality of the result 
is misleading. A first limitation becomes immediately 
apparent when we consider the wave equation du = 0. 
A fundamental feature of this equation is finite speed of 
propagation, which means, roughly speaking, that if at 
some time t a solution u is zero outside some bounded 
set, then the same must be true at all later times. 


4. For second-order equations of the kind of (32), this Is precisely 
condition (33). 


However, analytic funetions cannot have this property 
unless they are identically zero (see some fundamen- 
tal mathematical definitions [1.3 §5.6]). Therefore, 
it is impossible to discuss the wave equation properly 
within the class of real-analytic solutions. A related 
problem, first pointed out by hadamard [VI.65], con- 
cerns the impossibility of solving the Cauchy problem, 
in many important cases, for arbitrary smooth nonana- 
lytic data. Consider, for example, the Laplace equation 
Au = 0 in R d . As we have established above, any hyper- 
surface df is noncharacteristic, yet the Cauchy problem 
u\m = uo, n[u ] \}{ = ui, for arbitrary smooth initial 
conditions Uo, Ui, may admit no local solutions in a 
neighborhood of any point of df. Indeed, take df to 
be the hyperplane Xi = 0 and assume that the Cauchy 
problem can be solved for given nonanalytic smooth 
data in a domain that includes a closed ball B centered 
at the origin. The corresponding solution can also be 
interpreted as the solution to the Dirichlet problem in 
B, with the values of u prescribed on the boundary 3 B. 
But this, according to our heuristic principle (which can 
easily be made rigorous in this case), must be real ana- 
lytic everywhere in the interior of B, contradicting our 
assumptions about the initial data. 

On the other hånd, the Cauchy problem for the wave 
equation Du = 0 in R d+1 has a unique solution for 
any smooth initial data Uo, Ui that is prescribed on 
a spacelike hypersurface. This means a hypersurface 
ip(t,x) = 0 such that at every point p = (to,Xo) that 
belongs to it the normal vector at p Ues inside the 
light cone (either in the future direction or in the past 
direction). To say this analytically, 

/ „\ 1/2 

ia t c//(p)i > ( I3f«//(p)i 2 ) . (4i) 

t=i ’ 

This condition is clearly satisfied by a hyperplane of the 
form t = to, but any other hypersurface close to this 
is also spacelike. By contrast, the IVP is ill-posed for a 
timelike hypersurface, i.e., a hypersurface for which 

t S, iÉS 

\d t ip(p)\ < ( X \diV(P)\ ) ■ 

' i=l ’ 

That is, we cannot, for general non-real-analytic initial 
conditions, find a solution of the IVP. An example of a 
timelike hypersurface is given by the hyperplane x 1 = 
0. Let us explain the term “ill-posed” more precisely. 
Definition. A given problem for a PDE is said to be 
well-posed if both existence and uniqueness of solu- 
tions can be established for arbitrary data that belongs 
to a specified large space of funetions, which includes 
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the class of smooth functions. 5 Moreover, the solutions 
must depend continuously on the data. A problem that 
is not well-posed is called ill-posed. 

The continuous dependence on the data is very im- 
portant. Indeed, the IVP would be of little use if very 
small changes in the initial conditions resulted in very 
large changes in the corresponding solutions. 

2.5 Standard Classification 

The different behavior of the Laplace and wave equa- 
tions mentioned above illustrates the fundamental dif- 
ference between ODEs and PDEs and the illusory gener- 
ality of the CK theorem. Given that these two equations 
are so important in geometric and physical applica- 
tions, it is of great interest to find the broadest classes 
of equations with which they share their main proper- 
ties. The equations modeled by the Laplace equation 
are called elliptic, while those modeled by the wave 
equation are called hyperbolic. The other two impor- 
tant models are the heat equation (see (2)) and the 
Schrodinger equation (see (4)). The general classes of 
equations that they resemble are called parabolic and 
dispersive, respectively. 

Elliptic equations are the most robust and the eas- 
iest to characterize: they are the ones that admit no 
characteristic hypersurfaces. 

Definition. A linear, or quasilinear, N x N system with 
no characteristic hypersurfaces is called elliptic. 

Equations of type (32) whose coefficients a 1J satisfy 
condition (35) are clearly elliptic. The minimal-surface 
equation (7) is also elliptic. It is also easy to verify 
that the Cauchy-Riemann system (12) is elliptic. As was 
pointed out by Hadamard, the IVP is not well-posed for 
elliptic equations. The natural way of parametrizing the 
set of solutions to an elliptic PDE is to prescribe condi- 
tions for n, and some of its derivatives (the number of 
derivatives will be roughly half the order of the equa- 
tion) at the boundary of a domain D c U n . These are 
called boundary-value problems (BVPs). A typical exam- 
ple is the Dirichlet boundary condition u\sd = uo for 
the Laplace equation Au = 0 in a domain D c M n . 
One can show that, if the domain D satisfies certain 
mild regularity assumptions and the boundary value 
Uo is continuous, then this problem admits a unique 
solution that depends continuously on uq. We say that 


5. Here we are necessarily vague. A precise space can be specified 

in each given case. 


the Dirichlet problem for the Laplace equation is well- 
posed. Another well-posed problem for the Laplace 
equation is given by the Neumann boundary condition 
n[n] | sd = /, where n is the exterior unit normal to the 
boundary. This problem is well-posed for all continu- 
ous functions / defined on d D with zero mean aver- 
age. A typical problem of general theory is to classify 
all well-posed BVPs for a given elliptic system. 

As a consequence of our propagation-of-singularities 
principle, we deduce, heuristically at least, the follow- 
ing general faet: 

Classical solutions of elliptic equations with smooth 
(or real-analytic) coefficients in a regular domain D are 
smooth (or real analytic) in the interior ofD, whatever 
the degree ofsmoothness of the boundary conditions. 6 

Elyperbolic equations are, essentially, those for which 
the IVP is well-posed. In that sense, they provide the 
natural class of equations for which one can prove 
a result similar to the local existence theorem for 
ODEs. More precisely, for each sufficiently regular set 
of initial conditions there is a unique solution. We can 
thus think of the Cauchy problem as a natural way of 
parametrizing the set of all solutions to the equations. 

The definition of hyperbolicity depends, however, 
on the particular hypersurface we are considering as 
the initial hypersurface. Thus, in the case of the wave 
equation Du = 0, the standard IVP 

u(0,x) = Uo(x), dtu(0,x) = ui 
is well-posed. This means that for any smooth initial 
data xto, Mi we can find a unique solution of the equa- 
tion, which depends continuously on uo, Mi . As we have 
already mentioned, the IVP for Dm = 0 remains well- 
posed if we replace the initial hypersurface t = 0 by any 
spacelike hypersurface t p(t,x) = 0 (see (41)). However, 
it fails to be well-posed for timelike hypersurfaces, 
for which there may be no solution with prescribed, 
nonanalytic, Cauchy data. 

It is more difficult to give algebraic conditions for 
hyperbolicity. Roughly speaking, hyperbolic equations 
are at the opposite end of the Spectrum from ellip- 
tic equations: whereas elliptic equations have no char- 
acteristic hypersurfaces, hyperbolic equations have as 
many as possible passing through any given point. One 
of the most useful classes of hyperbolic equations, 


6. Provided that the boundary condition under consideration is 
well-posed. Moreover, this heuristic principle holds, in general, only 
for classical solutions of a nonlinear equation. There are in faet exam- 
ples of well-posed BVPs, for certain nonlinear elliptic systems, with no 
classical solutions. 
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which includes most of the important known examples, 
consists of equations of the form 

A°(t,x,u)d t u + Ai(t,x,u)diU = F(t,x,u), 

u \jf = u 0l (42) 

where all the coefficients 4°, A 1 , . . . ,A d are symmetric 
N xN matrices and Hf is given by ip(t,x) = 0. Such a 
system is well-posed provided that the matrix 

d 

A°(t,x,u)d t ip(t,x) + £ Ai(t,x,u)diip(t,x) (43) 

is positive definite. A system (42) that satisfies these 
conditions is called symmetric hyperbolic. In the par- 
ticular case when ip(t,x) = t, the condition (43) 
becomes 

(A°5,5)^c I5I 2 v§gm n . 

The following is a fundamental result in the theory 
of general hyperbolic equations. It is called the local 
existence and uniqueness of solutions for symmetric 
hyperbolic systems. 

Theorem (fundamental theorem for hyperbolic equa- 
tions). The IVP (42) is locally well-posed for symmet- 
ric hyperbohc systems with sufficiently smooth A, F, 
and Tf and sufficiently smooth initial conditions uq. In 
other words, if the appropriate smoothness conditions 
are satisfied, then for any point p e Tf there is a small 
neighborhood D of p 7 inside which there is a unique, 
continuously differentiable solution u. 

Remarks, (i) The local character of the theorem is 
essential, just as it was for the general propagation- 
of-singularities principle discussed earlier, since the 
result cannot be globalized in the particular case of 
the Burger equation (21), which fits trivially into the 
framework of general nonlinear symmetric hyperbolic 
systems. A precise version of the theorem above gives 
a lower bound on how large TI can be. 

(ii) The proof of the theorem is based on a variation of 
the Picard iteration method that we encountered earlier 
for ODEs. One starts by taking U( o> = tto in a neighbor- 
hood of J-f. Then one defines functions tt( n ) recursively 
as follows: 

d 

A°(t,X,U( n -i))dtU(n) + Y, Ai(t,X,U( n -i))diU(n) 

= F(t,X,U( n - 1)), U(n)\Tf = Uo. 


7. By “point” we mean that p is a spacetime point (t,x) e R 1+d . 

Similarly, D is a set of spacetime points. 


Notice that at each stage of the iteration we have to 
solve a linear equation. Linearization is an extremely 
important tool in studying nonlinear PDEs. We can 
almost never understand their behavior without lin- 
earizing them around important special solutions. 
Thus, almost invariably, hard problems in nonlinear 
PDEs reduce to understanding specific problems in 
linear PDEs. 

(iii) To implement the Picard iteration method we need 
to get precise estimates concerning Ui n) in terms of 
U( n -i). This step requires energy type a priori esti- 
mates, which we will discuss in section 3.3. 

Another important property of hyperbolic equations 
(which is not shared by elliptic, parabolic, or disper- 
sive equations) is finite speed of propagation, which 
was mentioned earlier in the case of the wave equa- 
tion (3). Consider this simple case again. The IVP can 
be solved explicitly by the so-called Kirchhoff formula. 
The formula allows us to conclude that if the initial 
data at t = 0 is zero outside a ball B a (x o) of radius 
a > 0 centered at xq e M 3 , then at time t > 0 the 
solution u is zero outside the ball B a+ t(x o). In gen- 
eral, finite speed of propagation canbest be formulated 
in terms of domains of dependence and influence of 
hyperbohc equations (see the Online version for general 
definitions). 

Hyperbohc PDEs play a fundamental role in physics, 
as they are intimately tied to the relativistic nature 
of the modern theory of fields. Equations (3), (5), (13) 
are the simplest examples of linear field theories, and 
they are manifestly hyperbohc. Other basic examples 
appear in gauge field theories such as maxwell’s equa- 
tions [IV. 1 3 §1.1] d a F a p = 0 or the Yang-Mills equa- 
tions D a F a p = 0. Finally, the Einstein equations (14) are 
also hyperbolic. 8 Other important examples of hyper- 
bohc equations arise in the physics of elasticity and 
inviscid fluids. As examples of the latter, the Burger 
equation (21) and the compressible Euler equation are 
hyperbohc. 

Ehiptic equations, on the other hånd, appear natu- 
rally in describing time-independent, or more generally 
steady-state, solutions of hyperbohc equations. Elhptic 
equations can also be derived, directly, by weh-defined 
VARIATIONAL PRINCIPLES [III.96]. 

Finally, a few words about parabolic equations and 
Schrodinger-type equations, which are intermediate 


8. For gauge theories and Einstein equations the notion of hyper- 
bolicity depends on the choice of gauge or coordinates. In the case 
of the Yang-Mills equations, for example, one obtains a well-defined 
system of nonlinear wave equations only in the Lorentz gauge. 
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between the elliptic and hyperbolic ones. Large classes 
of useful equations of these types are given by 

d t u — Lu = f (44) 

and 

i d t u + Lu = f, (45) 

respectively, where L is an elliptic second-order oper- 
ator. One looks for solutions u = u(t,x), defined for 
t to, with the prescribed initial condition 


u(t 0 ,x) = uo(x) (46) 


on the hypersurface t = to- Strictly speaking, this 
hypersurface is characteristic, since the order of the 
equation is 2 and we cannot determine dfu at t = to 
directly from the equation. Yet this is not a serious 
problem; we can still determine dfu formally by differ- 
entiating the equation with respect to dt . Thus, the IVP 
(44) (or (45)) with initial condition (46) is well-posed, but 
not quite in the same sense as for hyperbolic equations. 
For example, the heat equation -dtu+Au is well-posed 
for positive t but ill-posed for negative t. The heat equa- 
tion may also not have unique solutions for the IVP 
unless we make assumptions about how fast the initial 
data is allowed to grow at infmity. One can also show 
that the characteristic hypersurfaces of the equation 
(44) are all of the form, and therefore parabolic equa- 
tions are quite similar to elliptic equations. For exam- 
ple, one can show that if the coefficients ab and / are 
smooth (or real analytic), then the solution u must be 
smooth (or real analytic in x) for t > to even if the ini- 
tial data Uo is not smooth, which is consistent with our 
propagation-of-singularities principle. The heat equa- 
tion smooths out initial conditions. It is for this reason 
that the heat equation is useful in many applications. In 
physics, parabolic PDEs arise whenever diffusion or dis- 
sipation phenomena are important, while in geometry 
and calculus of variations, parabolic PDEs often arise 
as gradient flows of positive-definite functionals. Ricci 
flow (16) can also be viewed as a parabolic PDE, after a 
suitable change of coordinates. 

Dispersive PDEs, of which the Schrodinger equation 
(4) is a fundamental example, are evolution equations 
that behave analogously to hyperbolic PDEs in many 
respects. For instance, the IVP tends to be locally well- 
posed both forward and backward in time. However, 
solutions to dispersive PDEs do not propagate along 
characteristic surfaces. Instead, they move at speeds 
that are determined by their spatial frequency; in gen- 
eral, high-frequency waves tend to propagate at much 


greater speeds than low-frequency waves, which even- 
tually leads to a dispersion of the solution into increas- 
ingly large areas of space. In faet, the speed of prop- 
agation of solutions is typically infinite. This behav- 
ior also differs from that of parabolic equations, which 
tend to dissipate the high-frequency components of a 
solution (sending them to zero) rather than dispersing 
them. In physics, dispersive equations arise in quantum 
mechanics: they are the nonrelativistic limit c ^ oo of 
relativistic equations and they are also approximations 
to model certain types of fluid behavior. For instance, 
the KORTEWEG-DE VRIES EQUATION [III. 51], 
dtu + d%u = 6 ud x u, 

is a dispersive PDE that models the behavior of small- 
amplitude waves in a shallow canal. 

2.6 Special Topics for Linear Equations 

The greatest successes of the general theory have been 
in connection with linear equations, especially those 
with constant coefficients, for which Fourier analysis 
provides an extremely powerful tool. While the related 
issues of classification, well-posedness, and propaga- 
tion of singularities have dominated the study of lin- 
ear equations, there are other issues of interest as well, 
including the following. 

2.6.1 Local Solvability 

This is the problem of determining the conditions on 
a linear operator T and given data / under which the 
equation (9) is locally solvable. The Cauchy-Kowalewski 
theorem gives a criterion for local solvability when 
/ and the coefficients of T are real analytic, but it 
is a remarkable phenomenon that when one relaxes 
this assumption slightly, asking for / to be smooth 
rather than real analytic, serious obstructions to local 
solvability appear. For instance, the Lewy operator 

T[u\(t,z) = -iz|y(t,z), 

defined on complex-valued funetions u : 1 x C — C, 
has the property that equation (9) is locally solvable 
for real-analytic / but not for “most” smooth /. The 
Lewy operator is intimately connected to the tangential 
Cauchy-Riemann equations on the Heisenberg group in 
C 2 . It was discovered in the study of the restriction of 
the two-dimensional analogue of the Cauchy-Riemann 
operator T to a quadric in C 2 . This example was the 
starting point for the theory of local solvability, whose 
goal is to characterize linear equations that are locally 
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solvable. The theory of Cauchy-Riemann manifolds— 
which has its origin in the study of restrictions of 
the Cauchy-Riemann equations (in higher dimensions) 
to real hypersurfaces, each of which comes with an 
associated “tangential Cauchy-Riemann complex” — is 
another extremely rich source of examples of inter- 
esting linear PDEs, which do not fit into the standard 
classification. 

2.6.2 Unique Continuation 

This concerns various ill-posed problems where solu- 
tions may not always exist, but one still has unique- 
ness. A fundamental example is that of analytic con- 
tinuation: two holomorphic functions on a connected 
domain D that agree on a nondiscrete set (such as a disk 
or an interval) must necessarily agree everywhere on D. 
This faet can be viewed as a unique continuation result 
for the Cauchy-Riemann equations (12). Another exam- 
ple in a similar spirit is Holmgren’s theorem, which 
asserts that solutions to a linear PDE (9) that has real- 
analytic coefficients and data are unique, even in the 
class of smooth functions. More generally, the study of 
ill-posed problems (such as the wave equation with pre- 
scribed data on a timelike surface rather than a space- 
like one) arises naturally in connection with control 
theory. 

2.6.3 Spectral Theory 

There is no way I can even begin to give an account 
of this theory, which is of fundamental importance 
not only to quantum mechanics and other physical 
theories, but also to geometry and analytic number 
theory [IV.2]. Just as a matrix A can often be analyzed 
through its eigenvalues and eigenvectors [1.3 §4.3] 
by the tools of linear algebra, one can learn mueh about 
a linear differential operator T and its associated PDE 
by understanding that operator’s Spectrum [III.88] and 
eigenfunetions with the help of tools from functional 
analysis [IV. 15]. A typical problem in spectral theory 
is the eigenvalue problem in M d : 

-A u(x) + Vlx)u(x) = Au(x). 

A funetion u that is localized in space (for example, by 
being bounded in the L 2 (R d )-norm) and that satisfies 
this equation is mapped by the linear operator -A + V 
to the funetion Au: we say that u is an eigenfunetion 
with eigenvalue A. 

Suppose that we have an eigenfunetion u and let 
4>(t,x) = e~ M u(x). It is easy to check that <f> is a 


solution of the Schrodinger equation 

i d t <t> + A<t> - V<\> = 0. (47) 

Moreover, it has a very special form. Such solutions are 
called bound States of the physical system described 
by (47). The eigenvalues A, which form a discrete set, 
correspond to the quantum energy levels of the sys- 
tem. They are very sensitive to the choice of potential 
V. The inverse spectral problem is also important: can 
one determine the potential V from knowledge of the 
corresponding eigenvalues? The eigenvalue problem 
can be studied in considerable generality by replacing 
the operator -A + V with other elliptic operators. For 
instance, in geometry it is important to study the eigen- 
value problem for the Laplace-Beltrami operator, which 
is the natural generalization of the Laplace operator 
from R n to general riemannian manifolds [1.3 §6.10]. 
When the manifold has some arithmetic structure (for 
instance, if it is the quotient of the upper half-plane by 
a discrete arithmetic group), this problem is of major 
importance in number theory, leading, for instance, to 
the theory of Hecke-Maas forms. A famous problem 
in differential geometry (“can you hear the shape of 
a drum?”) is to characterize the metric on a compact 
surface from the spectral properties of the associated 
Laplace-Beltrami operator. 

2.6.4 Scattering Theory 

This theory formalizes the intuition from quantum 
mechanics that a potential which is small or localized 
is largely unable to “trap” a quantum particle, which is 
therefore likely to escape to infinity in a manner resem- 
bling that of a free particle. In the case of equation (47), 
solutions that scatter are those that behave freely as 
t — oo. That is, they behave like solutions to the free 
Schrodinger equation iø t ip + Ai p = 0. A typical prob- 
lem in scattering theory is to show that, if V(x) tends 
to zero sufficiently fast as |x| -* oo, all solutions, except 
the bound States, scatter as t — oo. 

2.7 Conclusions 

In the analytic case, the CK theorem allows us to solve 
the IVP locally for very general classes of PDEs. We have 
a general theory of characteristic hypersurfaces of PDEs 
and a good general understanding of how they relate 
to propagation of singularities. We can also distinguish 
in considerable generality the fundamental classes of 
elliptic and hyperbolic equations and can define gen- 
eral parabolle and dispersive equations. The IVP for 



IV. 12. Partial Differential Equations 


159 


a large class of nonlinear hyperbolic systems can be 
solved locally in time, for sufficiently smooth initial 
conditions. Similar local-in-time results hold for gen- 
eral classes of nonhnear parabolic and dispersive equa- 
tions. For Unear equations a lot more can be done. 
We have satisfactory results concerning the regularity 
of solutions for elbptic and parabolic equations and 
a good understanding of the propagation of singular- 
ities for a large class of hyperbolic equations. Some 
aspects of spectral theory and scattering theory and 
problems of unique continuation can also be studied 
in considerable generality. 

The main defect of the general theory concerns the 
passage from local to global. Important global features 
of special equations are too subtle to flt into a general 
scheme. Rather, each important PDE requires special 
treatment. This is particularly true for nonlinear equa- 
tions: the long-term behavior of solutions is very sen- 
sitive to the special features of the equation at hånd. 
Moreover, general points of view may obscure, through 
unnecessary technical complications, the main proper- 
ties of the important special cases. A useful general 
framework is one that provides a simple and elegant 
treatment of a particular phenomenon, as is the case for 
symmetric hyperbolic systems and the phenomenon 
of local well-posedness and finite speed of propaga- 
tion. However, it turns out that symmetric hyperbolic 
systems are simply too general for the study of more 
refined questions about the important examples of 
hyperbolic equations. 

3 General Ideas 

As one turns away from the general theory, one may 
be inclined to accept the pragmatic point of view 
described earlier, according to which PDEs is not a 
real subject but is rather a collection of subjects such 
as hydrodynamics, general relativity, several complex 
variables, elasticity, etc., each organized around a spe- 
cial equation. However, this rather widespread view- 
point has its own serious drawbacks. Even though spe- 
cific equations have specific properties, the tools that 
are used to derive them are intimately related. In faet, 
there is an impressive body of knowledge relevant to all 
important equations, or at least large classes of them. 
Lack of space does not allow me to do anything more 
than enumerate them below. 9 


9. I fall to mention in the few examples given above some of the 
important funetional analytic tools connected to Hilbert space meth- 

ods, compactness, the implicit funetion theorems, etc. I also fail to 
mention the importance of probabilistic methods and the develop- 


3.1 Well-Posedness 

As is clear from the previous section, well-posed prob- 
lems are at the heart of the modern theory of PDEs. 
Recall that these are problems that admit unique solu- 
tions for given smooth initial or boundary conditions, 
and that the corresponding solutions have to depend 
continuously on the data. It is this condition that leads 
to the classification of PDEs into elliptic, hyperbolic, 
parabolic, and dispersive equations. The first step in 
the study of a nonlinear evolution equation is a proof 
of a local-in-time existence and uniqueness theorem, 
similar to the one for ODEs. Ill-posedness, the coun- 
terpart of well-posedness, is also important in many 
applications. The Cauchy problem for the wave equa- 
tion (3), with data on the timelike hypersurface z = 0, is 
a typical example. Ill-posed problems appear naturally 
in control theory and inverse scattering. 

3.2 Explicit Representations and Fundamental 
Solutions 

Ourbasic equations (2)-(5) canbe solved explicitly. For 
example, the solution to the IVP for the heat equation 
in Rj. +d , that is, the problem of finding a funetion u that 
satisfies 

-d t u + Au = 0, u(0,x) = Uo(x), 

for t ^ 0, is given by 

u(t,x)=\ E d (t,x - y)u 0 (y)dy 

1 

for a certain funetion E d , which is called the fundamen- 
tal solution of the heat operator -dt + A. This fune- 
tion can be defined explicitly: when t ^ 0 it is 0, 
and when t > 0 it is given by the formula E d (t,x) = 
(47Tt) -d/2 e -|x|2/4t . Observe that E d satisfies the equa- 
tion (-3 1 + A )E = 0 in both regions t < 0 and t > 0, 
but it has a singularity at t = 0, which prevents it from 
satisfying the equation in the whole of IR 1 +d . In faet, we 
can check that for any funetion 10 (p t C^°(R <i+1 ), we 
have 

f E d {t,x)(dt<p(t,x) + Acf>(t,x)) dt d% = 0(0,0). 

(48) 

In the language of distribution theory [III. 18], for- 
mula (48) means that E d , as a distribution, satisfies 
the equation (-3 1 + A )E d = 5q, where 5 o is the Dirac 


ment of topological methods for dealing with global properties of 
elliptic PDEs. 

10. That is, any funetion that is smooth and has compact support 
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distribution in R 1 +d supported at the origin. That is, 
5o(<£) = <M 0,0), V<t> g Cq (R d+1 ). A similar notion of 
fundamental solution can be defined for the Poisson, 
wave, Klein-Gordon, and Schrodinger equations. 

A powerful method of solving linear PDEs with con- 
stant coefficients is based on the fourier trans- 
form [III.27]. For example, consider the heat equation 
dt - Am = 0 in one space dimension, with initial con- 
dition u(0,x) = uq. Define w(t,g) to be the Fourier 
transform of u relative to the space variable: 

u(t, g) = J e _lx? it(t, x) dx. 

It is easy to see that u(t, g) satisfies the differential 
equation 

d t u(t, g) = -g 2 u(t, g), u(0, g) = *o(g). 

This can be solved by a simple integration, which 
results in the formulatt(t, g) = tto(g)e -t| 5l 2 . Thus, with 
the help of the inverse Fourier transform, we derive a 
formula for u{t,x ): 

u(t,x) = (2t r) _1 1 e ix? e _tl5|2 ilo(g) dg. 

Similar formulas can be derived for our other basic evo- 
lution equations. For example, in the case of the wave 
equation -dju + Au = 0 in three dimensions, subject 
to the initial data u(0,x) = Uo, dtu{ 0 , x) = 0 , we find 
that 

u(t,x) = (2n)~ 3 J" e ix ® cos(t|g|)uo(g) dg. (49) 

After some work, one can reexpress formula (49) in the 
form 

u(t,x) = d t ^(4nt)~ l ^ ^u 0 (y)da(y)y (50) 

where da is the area element of the sphere \x-y\ = tof 
radius t centered at x. This is the well-known Kirchhoff 
formula. By contrast with (49), the integration here is 
with respect to the physical variables t and x only. It is 
instructive to compare these two formulas. Using the 
Plancherel identity it is very easy to deduce from (49) 
the L 2 bound 

j R3 \u(t,x)\ 2 dx ^ C||moII 2 2 (R 3), 

while the possibility of obtaining such a bound from 
(50) seems unlikely since the formula involves a deriva- 
tive. On the other hånd, (50) is perfect for giving us 
information about the domain of influence. Indeed, we 
can see immediately from the formula that if Uq is zero 
outside the hall B a = {[x - xol < a}, then u(t,x ) is 
zero outside the hall B a +|t| for any time t. This faet 
does not seem at all transparent in the Fourier-based 


formula (49). The faet that different representations of 
solutions have different, even opposite, strengths and 
weaknesses has important consequences for construct- 
ing approximate solutions, or parametrices, for more 
complicated equations, such as linear equations with 
variable coefficients or nonlinear wave equations. There 
are two possible types of constructions: those in physi- 
cal space, which mimic the physical-space formula (50), 
and those in Fourier space, which mimic the formula 
(49). 

3.3 A Priori Estimates 

Most equations cannot be solved explicitly. However, 
if we are interested in qualitative information about a 
solution, then it is not necessary to derive it from an 
exact formula. But how else, one might wonder, can we 
extract such information? A priori estimates are a very 
important technique for doing this. 

The best-known examples are energy estimates, the 
maximum principle, and monotonicity arguments. The 
simplest example of the first type is the following iden- 
tity (which is a very simple example of a so-called 
Bochner-type identity): 

f |3 2 u(x)| 2 dx = f |Au(x)| 2 dx. 

JR d JR d 

The left-hand side is shorthand for 

f X \didju(x)\ 2 dx 
Jw “ K1.K4 

and the identity holds for all funetions u that are 
twice continuously differentiable and tend to zero as 
|x| — oo. This formula can be justified f airly simply by 
integrating by parts. As a consequence of the Bochner 
identity, we obtain the a priori estimate that if u is 
a smooth solution to the Poisson equation (6) with 
square-integrable data /, and if it tends to zero at infin- 
ity, then the square integral of its second derivatives is 
bounded: 

f |3 2 u(x)| 2 dx ^ f |/(x)| 2 dx < oo. (51) 

jR d 

Thus we obtain the qualitative faet that, on average 
(in a mean-square sense), u has “two more degrees 
of regularity” than /. 1 1 This is called an energy- type 
estimate because, in physical situations, the square of 


11. A crucial faet, about which one can read more in the Online 
version, is that the I 2 -norms in (51) can be replaced by IP-norms, 
1 < p < oo, or Holder-type norms. The first case corresponds 
to Calderon-Zygmund estimates , while the second corresponds to 
Schauder estimates. Both are extremely important in the study of 
regularity properties for solutions to second-order elliptic PDEs. 
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the I 2 -norm can often be interpreted as some type of 
kinetic energy. 

The Bochner identity can be extended to more gen- 
eral Riemannian manifolds than R d , although one then 
picks up some additional lower-order terms involving 
the curvature of those manifolds. Such identities play 
a major role in the theory of geometric PDEs on these 
manifolds. 

Energy-type identities and estimates also exist for 
parabolle, dispersive, and hyperbolic PDEs. For di- 
stance, they play a fundamental role in demonstrat- 
ing the local existence, uniqueness, and finite speed 
of propagation for hyperbolic PDEs with smooth initial 
data. Energy estimates become particularly powerful 
when combined with inequalities such as the Sobolev 
embedding inequality, which allows one to convert 
the “I 2 ” information provided by these estimates into 
pointwise (or “L°°”) type information (see function 
spaces [III. 2 9 §§2.4, 3]). 

While energy identities and I 2 estimates (which, as 
in the above example, come from integration by parts) 
apply to all, or at least major classes of, PDEs, the 
maximum principle can be applied only to elliptic and 
parabolle PDEs. The following theorem is the simplest 
manifestation of it. Note that the theorem provides us 
with important quantitative information about solu- 
tions to the Laplace equation even in the absence of 
any explicit representation for them. 

Theorem (maximum principle). Assume that u is a 
solution to the Laplace equation (1) on a bounded con- 
nected domain D e R d with a smooth boundary d D. 
Assume also thatu is continuous on the closure ofD 
and has continuous first and second partial derivatives 
in the interior ofD. Then u must achieve its maximum 
and minimum values on the boundary. Moreover, if the 
maximum or minimum is also achieved at an interior 
point of D, then u must be constant in D. 

The method is very robust and can easily be extended 
to a large class of second-order elliptic equations. It can 
also be extended to parabolic equations and systems, 
and plays a crucial role in, for example, the study of 
Ricci flow. 

Let us briefly mention some other important classes 
of a priori estimates. The Sobolev inequalities, which 
are of prime importance in elliptic equations, have 
several counterparts in linear and nonlinear hyper- 
bolic and dispersive equations, including the Strichartz 
estimates and bilinear estimates. In connection with 


ill-posed problems and unique continuation, Carie- 
man estimates play a fundamental role. Finally, sev- 
eral a priori estimates arising from monotonicity for- 
mulas 12 — such as virial identities, Pohozaev identities, 
or Morawetz inequalities— can be used to establish the 
breakdown of regularity or the blow-up of solutions 
to some nonlinear equations, and to guarantee global 
existence and decay of solutions to others. 

To summarize, it is not mueh of an exaggeration to 
say that a priori estimates play a fundamental role in 
more or less every aspect of the modern theory of PDEs. 

3.4 Bootstrap and Continuity Arguments 

The bootstrap argument is a method, or rather a pow- 
erful general philosophy, to derive a priori estimates 
for nonlinear equations. According to this philosophy 
we start by making educated assumptions about the 
solutions we are trying to describe. These assumptions 
allow us to think of the original nonlinear problem as 
a linear one whose coefhcients satisfy properties con- 
sistent with the assumptions. We may then use linear 
methods, based on other a priori estimates that we 
already know, to try to show that the solutions to this 
linear problem behave as well as we have postulated— 
in faet, even better. One can characterize this powerful 
method, which allows us to use linear theory without 
actually having to linearize the equation, as a concep- 
tual linearization. It can also be regarded as a continu- 
ity argument relative to some parameter, which might 
be the natural time parameter of an evolution problem, 
but it could also be an artificial parameter which we 
have the freedom to introduce ourselves. This latter 
situation is typical of applications to nonlinear elliptic 
equations. In the Online version of this article we pro- 
vide a few examples to illustrate the method in both 
cases. 

3.5 The Method of Generalized Solutions 

Since a PDE involves differentiation, it might seem obvi- 
ous that in any discussion of PDEs we should restrict 
our attention to differentiable funetions. However, it is 
possible to generalize the notion of differentiation so 
that it makes sense for a wider class of funetions, and 
even for funetion-like objects, such as distributions, 
that are not funetions at all. This allows us to make 


12. Perhaps the most familiar example of a monotonicity phe- 
nomenon is the second law of thermodynamics from physics, which 
asserts that, for many physical systems, the total entropy of the 
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sense of a PDE in a broader context, and admits the 
possibility of genemlized solutions. 

The best way to introduce generalized solutions in 
PDEs and explain why they are important is through 
the Dirichlet principle. This originates in the obser- 
vation that, out of all functions that are defined on 
a bounded domain D c R d , that satisfy prescribed 
Dirichlet boundary condition u\sn = /, and that live in 
an appropriate functional space X, the functions u that 
minimize the Dirichlet integral (or Dirichlet functional) 

Wuf Dr =l\ D \Vu\ 2 = lX\ D \SiU\ 2 (52) 

are the harmonic functions (that is, solutions of the 
equation Au = 0). It was riemann [VI.49] who first had 
the idea of trying to use this faet to solve the Dirichlet 
problem: in order to find a solution u to the problem 
Au = 0, u\ dD = u 0 , (53) 

one should find (by some means other than solving 
the Dirichlet problem) a funetion u that minimizes the 
Dirichlet integral while equaling tto on 3D. To do this, 
one must specify the set by functions, or rather the 
funetion space, over which the minimization is taking 
place. The history of how this choice was made is a fas- 
cinating one. A natural choice is X = C 1 ( D ), the space 
of continuously differentiable functions on D, where 
the norm of a funetion v is 

Hvile' un = sup(|v(x)| + iav(x)|). 

xeD 

In particular, the Dirichlet norm \\v\\dt is flnite when 
v belongs to this space. In faet, Riemann chose 
X = C 2 (D ) (a similar space but designed for twice 
continuously differentiable functions). This bold but 
flawed attempt was followed by a penetrating criticism 
by weierstrass [VI.44], who showed that the func- 
tional does not have to achieve its minimum in either 
C 2 (D) or C 1 (i5). However, Riemann’s basic idea was 
revived, and it eventually triumphed after a long and 
inspiring process that involved deftning appropriate 
funetion spaces, introducing the notion of generalized 
solutions, and developing a regularity theory for them. 
(The precise formulation of the Dirichlet principle also 
requires the definition of sobolev spaces [III.29 §2.4].) 

Let us briefly summarize the method, which has since 
been vastly extended so that it can be applied to a large 
class of linear 13 and nonlinear elliptic and parabolic 
equations. It is based on two steps. In the first step one 


1 3 . A notable example for applicatlons in geometry is Hodge theory. 


applies a minimization procedure. Although, as Weier- 
strass discovered, the natural funetion spaces may not 
contain functions that achieve the minimum, one can 
use such a procedure to find a generalized solution 
instead. This may not seem very interesting, since we 
were looking for a funetion that solves the Dirichlet 
problem (or one of the other problems to which the 
method can be applied). But this is where the second 
step comes in: it is sometimes possible to show that the 
generalized solution must in faet be a classical solu- 
tion (that is, an appropriately smooth funetion) after 
all. This is the “regularity theory” mentioned earlier. 
In some situations, however, the generalized solution 
may turn out to have singularities and therefore not 
be regular. Then the challenge is to understand the 
nature of these singularities and to prove realistic par- 
tial regularity results. For instance, it is sometimes pos- 
sible to prove that the generalized solution is smooth 
everywhere apart from in a small “exceptional set.” 

Though generalized solutions are at their most effee- 
tive for elliptic problems, their range of applicability 
encompasses all PDEs. For example, we have already 
seen that the fundamental solutions to the basic lin- 
ear equations have to be interpreted as distributions, 
which are examples of generalized solutions. 

The notion of generalized solutions has also proved 
successful for nonlinear evolution problems, such as 
systems of conservation laws in one space dimension. 
An excellent example is provided by the Burger equa- 
tion (21). As we have seen, solutions to dtu + ud x u = 
0 develop singularities in flnite time no matter how 
smooth the initial conditions are. It is natural to ask 
whether solutions continue to make sense, as general- 
ized solutions, even beyond the time when these singu- 
larities form. A natural notion of generalized solution 
is a funetion u such that 

| i+i (3 t u + ud x u)4> = 0 

for every smooth funetion 4> that is zero outside a 
bounded set, since one can make sense of the integral 
even when u is not a differentiable funetion. Integrat- 
ing this by parts (the first term with respect to t and 
the second with respect to x ) one obtains the following 
formulation: 

J ri+i ud t 4> + \ J r1+i u 2 d x 4> = 0 V</> g Cq (R 1+1 ). 

It canbe shown that, under additional conditions called 
entropy conditions, the IVP for the Burger equation 
admits a unique generalized solution that is global: 
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that is, valid for every t £ E. Today we have a satis- 
factory theory of global solutions to a large class of 
hyperbolic systems of one-dimensional “conservation 
laws.” These systems, for which the above-mentioned 
theory applies, are called strictly hyperbolic. 

For more compbcated nonlinear evolution equations, 
the question of what constitutes a good concept of 
a generalized solution, though fundamental, is far 
murkier. For higher-dimensional evolution equations 
the first concept of a weak solution was introduced by 
Leray. Let us call a generalized solution weak if one can- 
not prove any type of uniqueness for it. This unsatisfac- 
tory situation may be temporary, i.e., the result of our 
technical inabilities, or unavoidable, in the sense that 
the concept itself is flawed. Leray was able to produce, 
by a compactness method, a weak solution of the IVP 
for the navier-stokes equations [III.23]. The great 
advantage of the compactness method (and its mod- 
ern extensions, which can, in some cases, cleverly cir- 
cumvent lack of compactness) is that it produces global 
solutions for all data. This is particularly important for 
supercritical or critical nonlinear evolution equations, 
which we will discuss later. For these we expect clas- 
sical solutions to develop singularities in a fmite time. 
The problem, however, is that one has very little Con- 
trol over such solutions. In particular, we do not know 
how to prove their uniqueness. 14 Similar types of solu- 
tions were later introduced for other important non- 
linear evolution equations. In most of the interesting 
cases of supercritical evolution equations, such as the 
Navier-Stokes equations, the usefulness of the types of 
weak solutions discovered so far remains undecided. 

3.6 Microlocal Analysis, Parametrices, and 
Paradifferential Calculus 

One of the fundamental difficulties of hyperbolic and 
dispersive equations is the interplay between geo- 
metric properties, which concern the physical space, 
and other properties, intimately tied to oscillations, 
that are hest seen in Fourier space. Microlocal analy- 
sis is a general still-developing philosophy according 
to which one isolates the main difficulties by care- 
ful localizations in physical space or Fourier space 
or both. An important application of this point of 
view is the construction of parametrices for linear 
hyperbolic equations and their use in proving results 


14. Leray was very concemed about this point. Though, like all 
other researchers after him, he was unable to prove uniqueness of 
his weak solution, he managed to show that It must colnclde with a 
classical one as long as the latter does not develop singularities. 


about the propagation of singularities. Parametrices, 
as we have already mentioned , are approximate solu- 
tions of linear equations with variable coefficients, with 
error terms that are smoother. The paradifferential cal- 
culus is an extension of microlocal analysis to nonlin- 
ear equations. It allows one to manipulate the form of 
a nonlinear equation by taking account of how large 
and small frequencies interact, and it has achieved a 
remarkable technical versatility. 

3.7 Scaling Properties of Nonlinear Equations 

A PDE is said to have a scaling property if, whenever one 
rescales a solution in an appropriate way, one obtains 
another solution. Essentially, all basic nonlinear equa- 
tions have well-defined scaling properties. Take, for 
example, the Burger equation (21), d t u + ud x u = 0. If 
u is a solution of this equation, then so is the function 
u\ definedby u\(t,x) = u(\t, Åx). Similarly, if u is a 
solution of the cubic nonlinear Schrodinger equation in 
R d , 

id t u + Au + c\u\ 2 u = 0, (54) 

then so is u\(t,x) = Åu(Å 2 t, Ax). The relationship 
between the nonlinear scaling of the equation and 
the a priori estimates available for solutions to the 
equations leads to an extremely useful classification 
of equations into subcritical, critical, and supercritical 
equations. This will be discussed in more detail in the 
next section. For the moment it suffices to say that sub- 
critical equations are those for which the nonlinearity 
can be controlled by the existing a priori estimates of 
the equation, while supercritical equations are those 
for which the nonlinearity appears to be stronger. Crit- 
ical equations are borderline. The definition of critical- 
ity and its relationship with the issue of regularity play 
a very important heuristic role in nonlinear PDEs. One 
expects supercritical equations to develop singularities 
and subcritical equations not to. 

4 The Main Equations 

In the previous section we argued that, while there is 
no hope of findmg a general theory of all PDEs, there is 
nevertheless a wealth of general ideas and techniques 
that are relevant to the study of almost all important 
equations. In this section we indicate how it may be 
possible to identify the features that characterize the 
equations we call important. 

Most of our basic PDEs can be derived from simple 
geometric principles, which happen to coincide with 
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some of the underlying geometric principles of mod- 
ern physics. These simple principles provide a unifying 
framework 15 for the subject and help endow it with 
a sense of purpose and cohesion. They also explain 
why a very small number of bnear differential opera- 
tors, such as the Laplacian and the d’Alembertian, are 
all-pervasive. 

Let us begin with the operators. The Laplacian is the 
simplest differential operator that is invariant under 
rigid motions of Euclidean space— a faet that we noted 
at the beginning of this article. This is important math- 
ematically and physically: mathematically because it 
results in many symmetry properties and physically 
because many physical laws are themselves invari- 
ant under rigid motions. The d’Alembertian is, simi- 
larly, the simplest differential operator that is invariant 
under the natural symmetries, or Poincaré transforma- 
tions, of Minkowski space. 

Now let us turn to the equations. From the point of 
view of physics, the heat equation is basic because it is 
the simplest paradigm for diffusive phenomena, while 
the Schrodinger equation can be viewed as the Newto- 
nian limit of the Klein-Gordon equation. The geometric 
framework of the former is Galilean space, which itself 
is simply the Newtonian limit of Minkowski space. 16 

From a mathematical point of view, the heat, Schro- 
dinger, and wave equations are basic because the corre- 
sponding differential operators dt - A, (l/i)3t - A, and 
øf - A are the simplest evolution operators that can be 
built out of A. The wave operator, as just discussed, 
is basic in a deeper way because of the association 
between □ = —øf + A and the geometry of Minkowski 
space R 1+M . As for Laplace’s equation, one can view 
solutions to A<fi = 0 as special time-independent solu- 
tions to D<p = 0. Appropriate invariant and local def- 
initions of square roots of A and □, or □ - k 2 , corre- 
sponding to “spinorial representations” of the Lorentz 
group, lead to the associated Dirac operators (see (13)). 
In the same vein we can associate with every Rie- 
mannian or Lorentzian manifold the operator A g or 
Dg, respectively, or the corresponding Dirac operators. 
These equations inherit in a straightforward way the 
symmetries of the spaces on which they are defined. 

15. The scheme sketched below is only an attempt to show that, in 
spite of the enormous number of PDEs studied by mathematicians, 
physicists, and engineers, there are nevertheless simple basic princi- 
ples that unite them. I do not want, by any means, to imply that the 
equations discussed below are the only ones worthy of our attention. 

16. This is done by starting with the Minkowski metric m = 
diag(-l/c 2 , 1, 1, 1), where c corresponds to the velocity of light, and 


4.1 Variational Equations 

There is a general and extremely effeetive method for 
generating equations with prescribed symmetries that 
plays a fundamental role in both physics and geometry. 
One starts with a scalar quantity, called a Lagrangian, 
such as 

3 

£[</>] = X m^Øp^vø - V(<t>), (55) 

with 4> a real-valued funetion defined on R 1+3 and V 
some real funetion of </> such as, for example, V ( </> ) = 
c/) 3 . Here denotes the partial derivatives with respect 
to the coordinates x **, g = 0, 1,2,3, and m£ v = m gv , 
as earlier, denotes the 4x4 diagonal matrix with diago- 
nal entries (-1, 1, 1, 1), associated with the Minkowski 
metric. We associate with £[</>] the so-called action 
integral: 

SW = f £[</>]. 

JK3+1 

Notice that both £[4>] and S[4>] are invariant under 
translations and Lorentz transformations. In other 
words, if T : R 1+3 — R 1+3 is a funetion that does not 
change the metric and we define a new funetion by 
ip(t,x) = <p(T(t,x)), then £[ø] = £[(//] and S[<£] = 

sm. 

We shall consider a funetion 4> that minimizes the 
action integral. From this we wish to deduce that its 
derivative, in some appropriate sense, is zero, and 
hence to deduce other properties about <£. But 4> is 
a funetion that Uves in an infinite-dimensional space, 
so we cannot talk about derivatives in a completely 
straightforward way. To deal with this problem, we 
define a compact variation of 4> to be a smooth one- 
parameter family of funetions (p ts> : R 1 +3 — R, defined 
for each s in some interval ( — e, e), such that 4 )<0> (x) = 
4>(x) for every x e R 3 and </> (s) (x) = </>(x) for 
every (s, x) outside some bounded subset of R 1+3 . This 
allows us to differentiate with respect to s. 

Given such a variation, we denote the derivative 

d<£ (s) /d5| i= o by 4>. 

Definition. A held 4> is said to be stationary with 
respect to S if, for any compact variation <p (s> of c/), 



The variational principle. The variational principle, 
or principle of least action, States that an acceptable 
solution ofa given physical system must be stationary 
with respect to the action integral associated with the 
Lagrangian of the system. 
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The variational principle enables us to associate with 
the given Lagrangian a system of PDEs, obtained from 
the faet that </> is stationary, called the Euler-Lagrange 
equations. We illustrate this by showing that the non- 
linear wave equation in R 1+3 , namely 

□</>- V'(4>) =0, (56) 

is the Euler-Lagrange equation associated with the 
Lagrangian (55). Given a compact variation 4> (s> of c/y 
we set S(s) = S[</> (5) ]. Integration by parts gives 

^5(5) | ' = - V'(</>)</>] 

- I </>[□</> -V' («#»)]. 

Jk 3+1 

In view of the action principle and the arbitrariness of 
</> we infer that </> must satisfy equation (56). Thus (56) 
is indeed the Euler-Lagrange equation associated with 
the Lagrangian £[</>] = wR v 3 fJ </>3 v </> - V(cf>). 

One can similarly show that the Maxwell equations 
of electromagnetism — along with their beautiful exten- 
sions to the Yang-Mills equations, wave maps, and 
the Einstein equations of general relativity— are also 
variational. That is, they too can be derived from a 
Lagrangian. 

Remark. The variational principle asserts only that 
the acceptable solutions of a given system are sta- 
tionary: in general, we have no reason to expect that 
the desired solutions minimize or maximize the action 
integral. Indeed, this fails to be the case for systems 
that have a time dependence, such as the Maxwell equa- 
tions, Yang-Mills equations, wave maps, and Einstein 
equations. 

However, there is a large class of variational prob- 
lems, corresponding to time-independent physical sys- 
tems or geometric problems, for which the desired 
solutions do turn out to be extremal. The simplest 
example is that of geodesics in a Riemannian mani- 
fold M, which are minimizers 17 with respect to length. 
More precisely, the length funetional takes a curve y 
that passes through two fixed points of M and asso- 
ciates with it its length L(y), which plays the role of 
an action integral. In this case a geodesic is not just a 
stationary point for the funetional but a minimum. We 
also saw earlier that, according to the Dirichlet prin- 
ciple, solutions to the Dirichlet problem (53) minimize 
the Dirichlet integral (52). Another example is provided 


17. This is true, in general, only for sufficiently short geodesics, i.e., 
ones that pass through two points close to each other. 


by the minimal-surface equation (7), the solutions of 
which are minimizers of the area integral. 

The study of minimizers of various functionals, i.e., 
action integrals, is a venerable subject in mathematics 
that goes under the name of calculus of variations (see 
variational methods [III.96] for further discussion). 

Associated with the variational principle is another 
fundamental principle. A conservation law for an evo- 
lution PDE is a law that says that some quantity, typ- 
ically an integral quantity depending on the solution, 
must remam constant over time, for every solution of 
the equation. 

Noether’s principle. To any continuous one-parameter 
group of symmetries of the Lagrangian there corre- 
sponds a conservation law for the associated Euler- 
Lagrange PDE. 

Examples of such conservation laws are the famil- 
iar laws of conservation of energy, conservation of 
momentum, and conservation of angular momentum, 
all of which have important physical meaning. (Here, 
the one-parameter group of symmetries is just transla- 
tions in time.) for example, in the case of equation (56), 
the law of conservation of energy takes the form 

E(t) - E(0), (57) 

where the quantity E(tj, which equals 

£ (|Ot</>) 2 i ^ X(3 1 </>) 2 .+ V(</>))dx, (58) 

is called the total energy at time t. (The notation 
It stands for the set of all points ( t,x,y,z ) as 
( x , y, z) ranges over IR 3 .) Observe that (57) provides an 
extremely important a priori estimate for solutions to 
(56) in the case when V ^ 0. Indeed, if the energy of the 
initial data at t = 0 is finite (that is, if £( 0) < oo), then 

£ (o t </>) 2 + X^tØ) 2 ) «$£(0). 

We say that the energy identity (57) is coercive, which 
means that it leads to an absolute bound on all solu- 
tions with finite initial energy. 

4.2 The Issue of Criticality 

Por the most basic evolution equations of mathemati- 
cal physics, there are typically no better a priori esti- 
mates known than those provided by the energy. Tak- 
ing into account the scaling properties of the corre- 
sponding equations as well, one is led to the very impor- 
tant classification of our basic equations, mentioned 
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earlier, into subcritical, critical, and supercritical equa- 
tions. To see how this is done, consider again the non- 
linearscalar equation □</>-!/' (<£) = 0, andtake V(<£) to 
be (l/(p + 1))\4>\ P+1 - Recall that the energy integral is 
given by (58). If we assign to the spacetime variables the 
dimension of length, L, then the spacetime derivatives 
have dimension I -1 and therefore □ has the dimension 
of L~ 2 . To be able to balance the left- and right-hand 
sides of the equation □</> = 4 ) \ p ' </>, we need to assign 
a length scale to <£; we find this to be L 2I( ^~p > . Thus the 
energy integral, 

E(t) = f (2 — 1 1 3</>| 2 + \4 , \ p+1 ) dx, 

Jm* 

has the dimension L c , c = d - 2 + (4 / ( 1 - p ) ) , with d cor- 
responding to the volume element dx = dx 1 dx 2 ■ ■ ■ 
dx d , which scales Uke L d . We say that the equation is 
subcritical if c < 0 , critical if c = 0 , and supercritical 
if c > 0. Thus, for example, □</> - </> 5 = 0 is critical in 
dimension d = 3. The same sort of dimensional analy- 
sis can be done for all our other basic equations. An 
evolutionary PDE is said to be regular if all smooth 
fimte-energy initial conditions lead to global smooth 
solutions. It is conjectured that all subcritical equa- 
tions are regular, but one expects supercritical equa- 
tions to develop singularities. Critical equations are 
important borderline cases. The heuristic reason for 
this is that the nonlinearity tends to produce singu- 
larities while the coercive estimates prevent it. In sub- 
critical equations the coercive estimates are stronger, 
while for supercritical equations it is the nonlinearity 
that is stronger. However, there may be other, more 
subtle a priori estimates that are not accounted for by 
our crude heuristic argument. Thus, some supercritical 
equations, such as the Navier-Stokes equations, may 
still be regular. 

4.3 Other Equations 

Many other familiar equations can be derived from 
the variational ones described above by the following 
procedures. 

4.3.1 Symmetry Reductions 

Sometimes a PDE is very hard to solve but becomes 
much easier if one places additional symmetry con- 
straints on solutions. For example, if the PDE is rota- 
tion invariant and we look just for rotation-invariant 
solutions u(t,x), then we can regard these solutions 
as functions of t and r = |x|, effectively reducing the 


dimension of the problem. By this procedure of sym- 
metry reduction one can then derive a new PDE that 
is much simpler than the original one. Another, some- 
what more general, way of obtaining simpler equations 
is to look for solutions that satisfy some further prop- 
erty. For instance, one can assume that they are station- 
ary (that is, that they do not depend on the time vari- 
able), spherically symmetric, self-similar (which means 
that u{t,x) depends only on x/t a ), or traveling waves 
(which means that u(t,x) depends only on x - ut for 
some fixed velocity vector v). Typically, the equations 
obtained by such reductions have a variational struc- 
ture themselves. In faet, the symmetry reduction can 
be applied direetly to the original Lagrangian. 

4.3.2 The Newtonian Approximation and Other Limits 

We can derive a large class of new equations as limits of 
the basic ones described above by taking one or more 
characteristic speeds to infinity. The most important 
example is the Newtonian limit, which is obtained by 
letting the velocity of light go to infinity. As we have 
already mentioned, the Schrodinger equation can be 
derived in this way from the linear Klein-Gordon equa- 
tion. Similarly, we can derive the Lagrangians for the 
equations of nonrelativistic elasticity, fluid dynamics, 
or magnetohydrodynamics. It is an interesting faet that 
the nonrelativistic equations tend to look more messy 
than the relativistic ones. The simple geometric struc- 
ture of the original equations gets lost in the limit. The 
remarkable simplicity of the relativistic equations is a 
powerful example of the importance of relativity as a 
unifying principle. 

Once we are in the familiar world of Newtonian 
physics we can perform other well-known limiting pro- 
cedures. The famous incompressible euler equa- 
tions [III.23] are obtained by taking the limit of the 
general nonrelativistic fluid equations as the speed 
of sound tends to infinity. Various other limits are 
obtained relative to other characteristic speeds of the 
system or in connection with speciflc boundary con- 
ditions, such as the boundary-layer approximation in 
fluids. For example, in the limit as all characteristic 
speeds tend to infinity, the equations of elasticity turn 
into the familiar equations of a rigid body in classical 
mechanics. 

4.3.3 Phenomenological Assumptions 

Even after taking various limits and making symmetry 
reductions, the equations may still remam intractable. 
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However, in various applications it makes sense to 
assume that certain quantities are sufficiently small to 
be neglected. This leads to simplified equations that 
could be called phenomenological 18 in the sense that 
they are not derived from first principles. 

Phenomenological equations are “toy equations” that 
are used to illustrate and isolate important physical 
phenomena in complicated systems. A typical way of 
generating interesting phenomenological equations is 
to try to write down the simplest model equation that 
still exhibits a particular feature of the original sys- 
tem. For instance, the self-focusing plane-wave effects 
of compressible fluids or elasticity can be illustrated by 
the simple-minded Burger equation u t + uu x = 0. Non- 
linear dispersive phenomena, typical of fluids, can be 
illustrated by the famous Korteweg-de Vries equation 
Ut + uu x + u xxx = 0. The nonlinear Schrodinger equa- 
tion (54) provides a good model problem for nonlinear 
dispersive effects in optics. 

If it is well chosen, a model equation can lead to 
basic insights into the original equation itself. For this 
reason, simplified model problems are also essential 
in the day-to-day work of the rigorous researcher into 
PDEs, who tests ideas on carefully selected model prob- 
lems. It is crucial to emphasize that good results con- 
cerning the basic physical equations are rare; a very 
large percentage of important rigorous work in PDEs 
deals with simplified equations selected, for technical 
reasons, to isolate and focus our attention on some 
specific difficulties present in the basic equations. 

In the above discussion we have not mentioned diffu- 
sive equations 19 such as the Navier-Stokes equations. 
These are in faet not variational, and therefore do 
not quite fit into the above description. Though they 
could be viewed as phenomenological equations, they 
can also be derived from basic microscopic laws such 
as those governing the Newtonian-mechanical interac- 
tions of a very large number of particles N. In prin- 
ciple, 20 the equations of continuum mechanics, such 
as the Navier-Stokes equations, could be derived by 
letting the number of particles N — oo. 

Diffusive equations also turn out to be very useful in 
connection with geometric problems. Geometric flows 


18. I use this term here quite freely; it Is typlcally used m a some- 
what dlfferent context. Also, some of the equations that I call phe- 
nomenological below, e.g., dispersive equations, can be given formal 
asymptotic derivations. 

19. That is, equations where some of the basic physical quantities, 
such as energy, are not conserved and may In faet decrease in time. 
These are typically of parabolle type. 

20. To establish this rigorously remains a major challenge. 


such as mean curvature, inverse mean curvature, har- 
monic maps, Gauss curvature, and Ricci flow are some 
of the best-known examples. Diffusive equations can 
often be interpreted as the gradient flow for an associ- 
ated elliptic variational problem. They can be used to 
construct nontrivial stationary solutions to the corre- 
sponding stationary systems, in the limit as t — oo, or 
to produce foliations with remarkable properties, such 
as one that was used recently in the proof of a famous 
conjecture of Penrose. As we have already mentioned, 
this idea has recently found an extraordinary applica- 
tion in the work of Perelman, who has used Ricci flow to 
settie the three-dimensional Poincaré conjecture. One 
of his main new ideas was to interpret Ricci flow as a 
gradient flow. 

4.4 Regularity or Breakdown 

An additional source of unity for the subject of PDEs 
is the central role played by the problem of regularity 
or breakdown of solutions to the basic equations. It is 
intimately tied to the fundamental mathematical ques- 
tion of understanding what we actually mean by solu- 
tions and, from a physical point of view, to the issue of 
understanding the limits of validity of the correspond- 
ing physical theories. Thus, in the case of the Burger 
equation, for example, the problem of singularities can 
be tackled by extending our concept of solutions to 
accommodate shock waves, which are solutions that are 
discontinuous across certain curves in the (t, x) -space. 
In this case one can dehne a funetion space of general- 
ized solutions in which the IVP has unique, global solu- 
tions. Though the situation for more realistic physical 
systems is far less clear and far from being satisfac- 
torily solved, the generally held opinion is that shock- 
wave-type singularities can be accommodated without 
breaking the boundaries of the physical theory at hånd. 
The situation for singularities in general relativity is 
radically different. The singularities one expects there 
are such that no continuation of solutions is possible 
without altering the physical theory itself. The prevail- 
ing opinion here is that only a gravitational quantum 
field theory could achieve this. 

5 General Conclusions 

What, then, is the modern theory of PDEs? As a first 
approximation, one could say that it is the pursuit of 
the following main goals. 
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(i) Understand the problem of evolution for the basic 
equations of mathematical physics. The most press- 
ing issue in this regard is to understand when and 
how the local 21 (with respect to time) smooth solutions 
of the basic equations develop singularities. A simple- 
minded criterion for distinguishing between regular 
theories and those that may admit singular solutions is 
given by the distinction between subcritical and super- 
critical equations. As mentioned earlier, it is widely 
believed that subcritical equations are regular and that 
supercritical equations are not. Indeed, many subcrit- 
ical equations have been proved to be regular even 
though we lack a general procedure for establishing 
regularity results of this kind. The situation with super- 
critical equations is far more subtle. To start with, 
an equation that we now call supercritical 22 may in 
faet turn out to be critical, or even subcritical, upon 
the discovery of additional a priori estimates. Thus 
an important question concerning the issue of critical- 
ity, and consequently that of singular behavior, is: are 
there other, stronger, local a priori bounds that can- 
not be derived from Noether’s principle? The discov- 
ery of such a bound would be a major event in both 
mathematics and physics. 

Once we understand that the presence of singulari- 
ties in our basic evolution equations is unavoidable, we 
have to face the question of whether they can somehow 
be accommodated by a more general concept of what 
a solution is or whether their structure is such that the 
equation itself, indeed the physical theory that it under- 
lies, becomes meaningless. An acceptable concept of 
a generalized solution should, of course, preserve the 
deterministic nature of the equations: in other words, 
it should be uniquely determined from its Cauchy data. 

Finally, once an acceptable concept of generalized 
solutions is found, we would like to use it to deter- 
mine some important qualitative features, such as long- 
term asymptotic behavior. One can formulate a limit- 
less number of such questions , the answers to which 
will vary from equation to equation. 

(ii) Understand in a rigorous mathematical fashion the 
range ofvalidity ofvarious approximations. The equa- 
tions obtained by various limiting procedures or phe- 
nomenological assumptions can of course be stud- 


2 1 . One of the important achievements of the past century of math- 
ematics was the establishment of a general procedure that guaran- 
tees the existence and uniqueness of a local-in-time solution to broad 
classes of initial conditions and large classes of nonlinear equations, 
including all those we have already mentioned above. 

22. What we call supercritical depends on the strongest a priori 
coercive estimate available. 


ied in their own right, as the examples that we have 
referred to above are. However, they present us with 
additional problems to do with the mechanics of how 
they are derived from equations that we regard as 
more fundamental. It is entirely possible, for exam- 
ple, that the dynamics of a derived system of equa- 
tions leads to behavior that is incompatible with the 
assumptions made in its derivation. Alternatively, a par- 
ticular simplifying assumption, such as spherical sym- 
metry in general relativity or zero vorticity for com- 
pressible fluids, may turn out to be unstable at large 
scales and therefore not a reliable predictor of the gen- 
eral case. These and other similar situations lead to 
important dilemmas: should we persist in studying the 
approximate equations even when, in many cases, we 
face formidable mathematical difficulties (some which 
may turn out to be quite pathological and are per- 
haps related to the nature of the approximation), or 
should we abandon them in favor of the original system 
or a more suitable approximation? Whatever one may 
feel about this in any specific situation, it is clear that 
the problem of understanding, rigorously, the range 
of validity of various approximations is one of the 
fundamental goals in PDEs. 

(iii) Devise and analyze the right equation for studying 
the specific geometric or physical problem at hånd. This 
last goal is equally important even though it is neces- 
sarily vague. The enormously important role played by 
PDEs in various branches of mathematics is more evi- 
dent than ever. One looks in awe at how equations such 
as the Laplace, heat, wave, Dirac, KdV, Maxwell, Yang- 
Mills, and Einstein equations, which were originally 
introduced in specific physical contexts, turned out 
to have very deep applications to seemingly unrelated 
problems in areas such as geometry, topology, alge- 
bra, and combinatorics. Other PDEs appear naturally 
in geometry when we look for embedded objects with 
optimal geometric shapes, such as solutions to isoperi- 
metric problems, minimal surfaces, surfaces of least 
distortion or minimal curvature, or, more abstraetly, 
connections, maps, or metrics with distinguished prop- 
erties. They are variational in character, just like the 
main equations of mathematical physics. Other equa- 
tions have been introduced with the goal of allowing 
one to deform a general object, such as a map, connec- 
tion, or metric, to an optimal one. They usually arise 
in the form of geometric, parabolle flows. The most 
famous example of this is Ricci flow, first introduced 
by Richard Hamilton, who hoped to use it to deform 
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Riemannian metrics into Einstein metrics. Similar ideas 
were used earlier to construct, for example, stationary 
harmonic maps with the help of a harmonic heat flow, 
and self-dual Yang-Mills connections with the help of 
a Yang-Mills flow. In addition to the successful use 
of Ricci flow to settie the Poincaré conjecture in three 
dimensions, another remarkable recent example of the 
usefulness of geometric flows is that of the inverse 
mean flow, first introduced by Geroch, to settie the 
so-called Riemannian version of the Penrose inequality. 

Further Reading 

Evans, L. C. 1998. Partial Differential Equations. Gradu- 
ate Studies in Mathematics, volume 19. Providence, RI: 
American Mathematical Society. 

John, F. 1991. Partial Differential Equations. New York: 
Springer. 

Wald, R. M. 1984. General Relativity. Chicago, IL: Chicago 
University Press. 

Brezis, H., and F. Browder. 1998. Partial differential equa- 
tions in the 20th century. Advances in Mathematics 135: 
76-144. 

Constantin, P. 2007. On the Euler equations of incompress- 
ible fluids. Bulletin of the American Mathematical Society 
44:603-21. 

Klainerman, S. 2000. PDE as a unified subject. In *GAFA 
2000*, Visions in Mathematics— Towards 2000 (special 
issue of Geometric and Functional Analysis ), part 1, 
pp. 279-315. 


IV. 13 General Relativity and the 
Einstein Equations 

Mihalis Dafermos 


Einstein’s formulation of general relativity represents 
one of the great triumphs of modern physics and pro- 
vides the currently accepted classical theory that uni- 
fies gravitation, inertia, and geometry. The Einstein 
equations are the mathematical embodiment of this 
theory. 

The definitive form of the equations, 


R»v ~ ^Rønv = 8 ttT vv , (1) 


was attained in November 1915; this was the final act of 
Einstein’s eight-year struggle to generalize his principle 
of relativity so as to encompass gravitation, which had 
been described in the earlier “Newtonian” theory by the 
Poisson equation 


d 2 4> d 2 <f> 

dxi%dy 2 


(2) 


for the potential <£ and mass density p. 

An obvious contrast between the Einstein equations 
(1) and the Poisson equation (2) is that the mysteri- 
ous notation of the former makes it far less obvious 
what they even mean. This has given the subject of 
general relativity a reputation for difficulty and impen- 
etrability. However, this reputation is to some extent 
unwarranted. Both (1) and (2) represent the culmination 
of revolutionary theories whose formulations presup- 
pose a complicated conceptual framework. For better 
or for worse, however, the structure necessary to for- 
mulate Poisson’s equation has been incorporated into 
our traditional mathematical notation and school edu- 
cation. As a result, IR 3 , with its Cartesian coordinate 
system, and notions such as functions, partial deriva- 
tives, masses, forces, and so on, are familiar to people 
with a general mathematical background, while the con- 
ceptual structure of general relativity is much less so, 
both with respect to its basic physical notions and with 
respect to the mathematical objects that are needed to 
model them. However, once one comes to terms with 
these, the equations turn out to be more natural and, 
one might even dåre say, simpler. 

Thus, the first task of this article is to explain in 
more detail the conceptual structure of general relativ- 
ity. Our aim will be to make it clear what the equations 
(1) actually denote, and, moreover, why they are in a cer- 
tain sense the simplest equations one can write down, 
given the general framework of the theory. This in turn 
will require us to review special relativity and its impli- 
cations for the structure of matter, which will bring 
us to the unified concept of stress-energy-momentum, 
described by a tensorial object T. Finally, we will join 
Einstein in his inspired leap to the notion of a gen- 
eral four-dimensional Lorentzian manifold (M,g) that 
represents our space-time continuum. We shall see 
that equation (1) expresses a relationship between the 
tensor T and the geometry of g as expressed in its 
so-called curvature. 

There is more to truly understanding a theory than 
merely knowing how to write down its governing equa- 
tions. General relativity is associated with some of 
the most spectacular predictions of twentieth-century 
physics: gravitational collapse, black holes, space-time 
singularities, the expansion of the universe. These phe- 
nomena (which were completely unknown in 191 5 and 
thus played no role in the formulation of the equa- 
tions (1)) revealed themselves only when the concep- 
tual issues surrounding the problem of global dynamics 
of solutions were understood. This took a surprisingly 
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long time, though the story is not as well-known as the 
heroic struggle to attain (1). The article will conclude 
with a very brief glimpse into the fascinating dynamics 
of the Einstein equations. 


postulate of the principle of relativity is that physical 
theories should not allow massive particles to move at 
speeds (as measured in any frame) greater than or equal 
to c. 


1 Special Relativity 


1.2 Minkowski, 1908 



1.1 Einstein, 1905 


Einstein’s 1905 formulation of special relativity stipu- 
lated that all fundamental laws of physics should be 
invariant under Lorentz transformations of the frame 
of reference defined by %, y, z , and t. A Lorentz trans- 
formation is any composition of translations, rotations, 
and the Lorentz boost, which is given by the formulas 
x -vt 


t = 


jgPp/C^ 

t - vx/c 2 


y = y. 


(3) 


■ Vi -v2/ c r.- | 

where c is a certain constant and \v < c. Thus, Ein- 
stein’s stipulation was that if one changes coordinates 
by means of a Lorentz transformation, then the form 
of all fundamental equations will remain the same. 
This set of transformations had already been identi- 
fied in the context of the study of the vacuum Maxwell 
equations for the electric held E and magnetic held B: 


V ■ E = 0, V ■ B = 0, 

, . [ (4) 

r 1 3(B+Vx£ = 0, r 1 3(£-VxB = 0.J 

Indeed, the Lorentz transformations are precisely the 
transformations that keep the form of the above equa- 
tions invariant if we also transform E and B appropri- 
ately. Their significance was emphasized by poincaré 
[VI.61]. However, it was Einstein’s profound insight 
to elevate this invariance to the status of funda- 
mental physical principle, despite its incompatibil- 
ity with what we now usually call Galilean relativity, 
which corresponds to taking c — oo in (3). A sur- 
prising consequence of Lorentz invariance is that the 
notion of simultaneity is not absolute but depends on 
the observer: given two distinet events that occur at 
(t,x,y,z) and (t,x',y',z'), it is easy to find a Lorentz 
transformation such that the transformed events no 
longer have the same t-coordinate. 

It follows from a celebrated result in partial differ- 
ential equations known as the strong Huygens princi- 
ple, applied to (4), that electromagnetic disturbances 
in vacuum propagate with speed c, which we thus iden- 
tify as the speed of light. In view of Lorentz invariance, 
this statement is independent of the frame! A further 


Einstein’s understanding of special relativity was “alge- 
braic.” It was minkowski [VI.64] who first understood 
its underlying geometric structure, namely, that the 
content of the principle was contained in the metric 
element 

-c 2 dt 2 + d% 2 + dy 2 + dz 2 (5) 

defined on R 4 with coordinates ( t,x,y,z ). We call 
R 4 endowed with the metric (5) Minkowski space-time 
and denote it R 3+1 . Points of R 3+1 are ref erred to 
as events. The expression (5) is classical notation for 
the inner product defined on tangent vectors v = 
(c _1 v°, v 1 ,!/ 2 ,!; 3 ), w = ( c _1 tp 0 ,ip 1 ,tp 2 ,u; 3 ) onR 4 by 
\v,w) = -v°w° + v 1 w 1 + v 2 w 2 + v 3 w 3 . (6) 

The Lorentz transformations constitute precisely the 
symmetry group of the geometry defined by (5). Ein- 
stein’s principle of relativity could now be understood 
as the principle that the fundamental equations of 
physics must refer to space-time only through geomet- 
ric quantities: that is, quantities that can be defined 
purely in terms of the metric. For example, from this 
point of view the reason that the notion of absolute 
simultaneity is not allowed is that it depends on a priv- 
ileged hyperplane through any given point of R 3+1 . 
But there are Lorentz transformations that preserve 
the metric and send this hyperplane to another one 
through the given point, so nothing in the metric can 
pick out one particular hyperplane. Note that if a physi- 
cal theory makes use of geometric quantities only, then 
it is automatically invariant under Lorentz transfor- 
mations: this observation renders many complicated 
calculations unnecessary. 

Let us explore this geometric point of view further. 
Note that nonzero vectors v are naturally classified by 
the inner product ( ■ , ■ ) into three types, called timelike, 
nuli, and spacelike, according to whether ( v , v) < 0, 
(v,v) = 0, or ( v,v ) > 0, respectively. Idealized point 
particles traverse curves y through space-time; these 
are called the world lines of the corresponding parti- 
cles. The postulate (referred to earlier) that speed in 
any frame of reference is bounded by the speed of light 
c can now be formulated as the following statement: if 
y is the world line of a particle, then the vector dy/ds 
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must be timelike. (Nuli lines correspond to light rays 
in the geometric optics limit of (4).) This statement is 
independent of the parameter 5 of y, but for world lines 
we shall always assume that dt/ds > 0. To phrase this 
more geometrically, (dy/ds, (c -1 , 0, 0, 0)) < 0, which 
we interpret as the statement that y is future-directed. 

We can now define the “length” of the world line of a 
particle by 

i(y) = | ^-<y,y) d5 



(7) 


Classically, the above expression would have been writ- 
ten simply as 

L(y) = J 1 J-(-c 2 At 2 + dx 2 + dy 2 + d z 2 ), 

which explains the notation (5). We refer to the quantity 
c _1 i(y) as proper time. This is the time that is relevant 
in local physical processes; in particular, if you are the 
particle traversing the world line y, then c _1 I(y) is the 
time that you will feel. 

The metric (5) contains three-dimensional Euclidean 
geometry 

dx 2 + dy 2 + dz 2 , 

restricted to t = 0, say. More interestingly, it also 
contains non-Euclidean geometry 

(l - £) dx 2 + (i - £) dy 2 + (l - dz 2 

when it is restricted to the hypersurface t = c _1 r = 
c _1 yx 2 + y 2 + z 2 . It is hard to overestimate how revo- 
lutionary the notion was that the time of physical pro- 
cesses (including our very sensations) and the length 
of measuring rods are two interdependent aspects of 
a geometric structure that naturally Uves on a four- 
dimensional space-time continuum. Indeed, even Ein- 
stein initiaUy rejected Minkowski space-time, prefer- 
ring to retain the independent reality of a definite 
“space,” albeit a space with a relative notion of simul- 
taneity. Only as a result of his search for general rel- 
ativity did he realize that this view is fundamentaUy 
untenable. We shall return to this in section 3. 

2 Relativistic Dynamics and the Unification 
of Energy, Momentum, and Stress 

Besides the space-time concept and its geometriza- 
tion, the principle of relativity led to a profound 


rearrangement and unification of the fundamental con- 
cepts of dynamics: mass, energy, and momentum. Ein- 
stein’s celebrated relation between mass and energy in 
the rest frame, 

E 0 = mc 2 , (8) 

is the best-known expression of one aspect of this unifi- 
cation. This relation arises naturaUy when one attempts 
to generalize Newton’s second law m(do/dt) = / to a 
relation between 4-vectors in Minkowski space. 

General relativity has to be formulated in terms of 
fields rather than particles. As a first step toward under- 
standing it, let us look at continuous media. Now, 
instead of particles we consider matter fields ; the uni- 
fication of dynamical concepts encompasses what is 
known as stress, and its complete expression is embod- 
ied by the so-called stress-energy-momentum tensor T. 
This tensor is fundamental to general relativity, so we 
have no choice but to farmliarize ourselves with it. It 
will be the key to the form of the Einstein equations (1) 
as weU as to the object on their right-hand side. 

For each point g g R 3+1 , the stress-energy-momen- 
tum tensor held T gives us a map 

T : R 4 x R 4 - R (9) 

defined by the formula 

3 

T(w,w) = X T a pw a w p . 

ot,P—0 

Here, T al 3 = Tp a for each « and fi. By R^ we mean the 
space of vectors at g. (In Minkowski coordinates, we 
often identify R 4 with R 4 , but it wiU be important to dis- 
tinguish between the two when considering arbitrary 
coordinates in section 3.2.) BiUnear maps of the form 
(9) are known as covariant 2-tensors. 

If the only matter present is described by what is 
known as a perfect fluid, then the components of T are 
given by 

Too = (p + p)u°u° - p, T 0i = (p + p)u l u°, 

Tij = (p + p)u l u i + P&, 

where u is the 4-velocity, a timelike vector normalized 
such that (u,u) = —c 2 , p is the mass-energy, p is the 
pressure, and where = 1 if i = j, 0 if i f j, and i and 
j range over 1, 2, 3. Greek indices will range over 0, 1, 
2,3. We identify Too with energy. Tot with momentum, 
and with stress. These notions are clearly frame- 
dependent. Finally, observe that T(u,u) = pc 2 . This is 
the fleld-theoretic version of the famous equation (8). 

In general, T is derived from the totality of all the 
matter fields by constitutive functions that depend 
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on the nature of the matter fields and their interac- 
tions. We need not worry here about such things. But, 
regardless of the nature of the matter fields involved, 
we always postulate that the following equations are 
satisfied: 

3 

-SoT 0a + X d i T i« = 0. 

Defining V° = -3o, V‘ = du and introducing the Ein- 
stein summation convention, under which summation 
is implicit when an index appears both upstairs and 
downstairs, we may rewrite this as 

V^v = 0. (10) 

These equations are Lorentz invariant. 

The above relations embody the conservation of 
stress-energy-momentum at a differential level. Inte- 
grating (10) between homologous hypersurfaces and 
applying the Minkowski-space version of the diver- 
gence theorem, one obtains global balance laws. If 
one assumes that T a p is compactly supported, then, 
integrating between t = ti and t = t2, one obtains 

J Toa dx 1 dx 2 dx 3 = J Toa dx 1 dx 2 dx 3 . (11) 

With respect to the chosen Lorentz frame, the zeroth 
component of the above equation represents the con- 
servation of total energy, while the remaining compo- 
nents represent conservation of total momentum. 

In the case of a perfect fluid, if we close the sys- 
tem (10) by adjoining a conservation law for particle 
number 

V“(nu a ) = 0 

and postulate constitutive relations between p, p, par- 
ticle number density n, and entropy per particle s, 
compatible with the laws of thermodynamics, then we 
arrive at the so-called relativistic Euler equations. 

3 From Special to General Relativity 

With the elements of special relativity at hånd, together 
with their deep implications for the nature of energy, 
momentum, and stress, we can now pass to the formu- 
lation of general relativity. 

3.1 The Equivalence Principle 

Einstein understood as early as 1907 that the most pro- 
found aspect of the gravitational force could not be 
described within the relativity principle as he had for- 
mulated it in 1905. This aspect is what he called the 
equivalence principle. 


The easiest setting in which to understand this prin- 
ciple is that of the “test particle” with velocity v(t) in 
a fixed gravitational field <f>. In this case, we have that 
the classical gravitational force is given by / = -mV4>, 
and we may rewrite Newton’s second law m(du/dt) = 

^-V*. (12) 

Notice that the mass m has dropped out! Thus, the 
gravitational field accelerates all objects at a given posi- 
tion in the same way. This explains the faet, recorded 
already in late antiquity by Ioannes Philoponus and 
popularized in Western Europe by Galileo, that the 
time it takes objects to fali from a given height is 
independent of their weight. 

It was Einstein who first interpreted this property as 
a sort of covariance with respect to transformations 
to noninertial, that is to say accelerated, frames. For 
instance, in the case of a constant gravitational field, 
which corresponds to the case <f>(z) = fz, we can pass 
to the accelerated frame 

z = z + \ ft 2 

and write (12) as 

£ “ 

Similarly, one can reverse the argument to “simulate” a 
gravitational field when none is present by expressing 
(13) in an accelerated frame. 

3.2 Vectors, Tensors, and Equations in 
General Coordinates 

Exactly what the equivalence principle means in gen- 
eral is somewhat obscure and has been the subject 
of debate ever since Einstein introduced it. Neverthe- 
less, the above considerations suggest that, even in the 
absence of gravity, it would be useful to know how 
various objects and equations appear when expressed 
in arbitrary coordinate systems. That is to say, let us 
change from our Minkowski coordinates x°, x 1 , x 2 , x 3 
to the most general coordinate system, which we shall 
write as x i* = x /i (x 0 ,x 1 ,x 2 ,x 3 ), where p ranges over 
0, 1, 2, 3. 

Expressing scalar funetions in arbitrary coordinates 
poses no problem. But what about vector fields? If v 
is a vector field expressed in Minkowski coordinates 
as (v°,t' 1 ,u 2 ,t' 3 ), how do we express v in our new 
coordinates x' 5 ? 

One has to think a bit about what a vector field actu- 
ally is. The correct point of view is to consider a vec- 
tor field v as a first-order differential operator defined 
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(using Einstein’s summation convention) by v(f ) = 
v^djjf . So we seek v p such that v(f) = v p dpf for all 
functions /. The Chain rule then gives us our answer: 


What about tensors, such as the stress-energy- 


momentum tensor T ? I 
seek Tpv such that 


where the numbers u p are the components of u with 
respect to the coordinates x p as we have just calculated 
them above. (Note that these components depend on 
the point q. This is why it is now essential to distinguish 
R 4 from R 4 .) Again, the Chain rule gives us the answer: 


One can interpret the above as a shorthand notation for 
(15), but it also tells us how to compute Tpv from T^ v 
by formally applying the Chain rule to dx p . 

There is another covariant symmetric 2-tensor be- 
sides T that is relevant here. This is the Minkowski met- 
ric itself. Indeed, the classical form of the Minkowski 
metric (5) corresponds to the representation 
qp V dx p dx v , 

where the q^ v for Minkowski coordinates x p are given 
by boo = -1, boi = 0, by = 1 if i = j, and by = 0 if 
i f j. To avoid the cumbersome notation (-,-), let us 
refer to the Minkowski metric as q. Following the above, 
we may express q in general coordinates x p by 
q pv dx p dx v , 

where qpv is computed by formal application of the 
chain rule. 

It is clear that if one tries to transform an equa- 
tion such as (10) into general coordinates, then the 
components of q and their derivatives will appear in 
the equations. Einstein (always thinking “algebraically”) 
was seeking laws of motion for both matter and the 
gravitational held that would have the same form in 
all coordinate systems. As he understood it, this meant 
that all objects that appear should transform as ten- 
sors and should be considered a priori “unknown.” He 
referred to this principle as “general covariance.” This 
suggests that q should be replaced by an unknown sym- 
metric 2-tensor. Let us call this 2-tensor g. One can of 
course try to write down an equation for the “unknown” 


g that forces it to be the “known” Minkowski metric q. 
Thus, “general covariance” per se does not force one to 
abandon q. But in view of the faet that g and T have 
the same number of components, it was a natural step 
to consider g as the embodiment of the gravitational 
held and to try to look for an equation that related g 
and T direetly. In this way, the framework of general 
relativity was born. 

3.3 Lorentzian Geometry 

The profound insight of replacing the fixed Minkowski 
q with a dynamic g brought Einstein to what we now 
call Lorentzian geometry. Lorentzian geometry gener- 
alizes Minkowski geometry following the blueprint of 
riemann [VI.49]. That is, we replace the Minkowski 
metric q by a general map 

g : R 4 x R 4 - R. 

In other words, we replace q by a symmetric covariant 
2-tensor, which is expressed in arbitrary coordinates 
x p by 

gp V dx p dx v . 

Moreover, we require that at each point g the bilin- 
ear form g ( ■ , ■ ) can be diagonalized to the Minkowski 
form (6). Loosely speaking, a Lorentzian metric is one 
that “looks locally like the Minkowski metric,” just as 
a riemannian metric [1.3 §6.10] looks locally like the 
Euclidean metric. 

Just as with the Minkowski metric, the bilinear form 
g permits us to classify nonzero vectors v q at a point q 
as timelike, nuli, or spacelike and to define proper times 
of world linesy (5) = (x°(s),x 1 (s),x 2 (s),x 3 (.s)) bythe 
formula (7), but with (y, y) replaced by g fJV x IJ x v . It is 
in this sense that we can speak of the geometry of g. 

In view of Minkowski’s formulation of the special rel- 
ativity principle as the statement that the equations 
of physics refer to space-time only through geomet- 
ric quantities associated with the Minkowski metric, it 
is natural to look for a generalization of this princi- 
ple, and indeed a suitable version immediately suggests 
itself. It is the principle that the equations of physics 
refer to the space-time coordinates only via geometric 
quantities naturally associated with g. 

The kinematic constraint on “test particles” as 
formulated geometrically for the Minkowski metric, 
namely that dy/d5 should be timelike, makes sense for 
an arbitrary Lorentzian metric. But how does one for- 
mulate differential equations? For instance, how does 
one formulate an analogue of (10) that refers only to g? 
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It turned out that in the Riemannian case, a set of 
natural geometric concepts suitable for the task had 
already been developed in the nineteenth and early 
twentieth centuries by Riemann, Bianchi, Christoffel, 
Ricci, and Levi-Civita. These carry over directly to the 
Lorentzian case. 

One begins by defining the so-called Christoffel sym- 
bols r* v by 

l'pv = \a Xp (d P a P v + d v g up - dpgpv). 

Here, the numbers g pv are the components of the 
“inverse metric” of g: that is, they are the unique solu- 
tion to the equation g pv g V A = 5 P , where, as usual, 
5 P = 1 if A = g and 0 otherwise. (It turns out that g pv 
is very useful for the calculational gymnastics that are 
typical of tensor analysis when it exploits the Einstein 
summation convention.) 

One can then define a differential operator called 

a connection, which acts on vector helds by 

VpV v = dpV v (16) 

and on covariant 2-tensors by 

V A Tpy = 9a Tpy - r^Tvy - 1%, T . (17) 

The left-hand sides of (16) and (17) dehne tensors that 
can be expressed in any coordinate system by a formal 
application of the chain rule. 

With the help of this differential operator, one could 
now write the analogue of equations (10) for an arbi- 
trary metric g as 

= 0, (18) 

where V*' = g pv V v refers to the connection associated 
with g. 

If we consider a limit as the matter held becomes 
concentrated at a point, or rather as the stress-energy- 
momentum tensor 7j, v is nonzero only on a world line, 
then this curve will be a geodesic of g: that is, a 
curve that locally maximizes the proper time dehned 
by g. These are the analogues of straight timelike lines 
in Minkowski space. In this limit, the motion of the 
matter does not depend on the nature of the stress- 
energy-momentum tensor, but only on the geometry 
of the metric that dehnes geodesics. Thus, all objects 
fail in the same way. These considerations give a con- 
crete realization to the equivalence principle in general 
relativity. 

Finally, it is important to remark that for a general 
metric g, the identity (18) does not imply global conser- 
vation laws (11) for “total energy” and “total momen- 
tum.” Such laws hold only if g has symmetries. The 


faet that the fundamental conservation laws survive in 
general only at the infimtesimal level is an important 
insight into the nature of these principles in physics. 

3.4 Curvature and the Einstein Equations 

It remains, then, to give a set of equations for the metric 
g that relate it to T. In anticipation of a Newtonian limit, 
we expect these equations to be second order, and we 
expect them to implement “general covariance” in the 
simplest way possible: they should refer to no other 
structure but g itself and T. 

Again, Riemannian geometry provides ready-made 
tensorial objects that are invariantly associated with g. 
One can define the Riemann curvature tensor 
Rpvxp dx p dx v dx A dx p 
with components given by 

RpvAp = gpa-(^pl'v\ ~ 3a fyp + lyA^r p — Rypf ta)- 
One can also define the Ricci curvature 
Rp V dx p dx v , 

a covariant symmetric 2 -tensor with components given 
by 

R PV = a Ap RpvAp, 
and the scalar curvature 

R = g pv Rp V - 

If g were the induced (Riemannian) metric on a 2- 
surface in M 3 , then R would just be twice the Gauss 
curvature K. The above expressions should be thought 
of as complicated tensorial generalizations of Gauss 
curvature to se veral dimensions. 

The final piece of the puzzle for the formulation of 
the Einstein equations (1) is provided by the following 
constraint that Einstein demanded: whatever the equa- 
tion relating the metric and the stress-energy-momen- 
tum tensor of matter, (18) (the infimtesimal conser- 
vation of stress-energy-momentum) should hold as a 
consequence. Now, it turns out that for any metric g, 
the so-called Bianchi identities imply that 

V p (Rpv - \gpvR) = 0. (19) 

It is thus natural to postulate a linear relation between 
Tp V and the tensor Rp V - \gp V R. The form 

Rp V - lgp V R = 8nGc~ 4 Tp V (20) 

is then uniquely determined by the requirement that 
it should give the correct Newtonian limit when one 
makes the identifications 

goo ~ 1 + 2</>/c 2 , g 0 j ~ 0, g tj ~ (1 - 2</>/c 2 )5 y . 
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The form (1) corresponds to the usual units G = c = 1. 
Note that (1), when written out explicitly, is nonlinear 
in the metric components g pv . 

Einstein did not stop at the Newtonian limit. By con- 
sidering geodesic motion in solutions of the linearized 
equations (20), Einstein was able to determine the cor- 
rect value for the anomalous precession of the perihe- 
lion of Mercury, an effect that Newtonian theory was 
unable to explain. Since (20) had no adjustable param- 
eters after determining the Newtonian limit, this was 
a genuine test of the theory. A few years later the 
gravitational “bending” of light was observed. This had 
been calculated theoretically in the context of the geo- 
metric optics approximation where light rays follow 
nuil geodesics in a fixed space-time background. Post- 
Newtonian predictions of (1) have now been verified by 
various solar system tests, confirming general relativity 
in this regime to a high degree of accuracy. 

One special case of (20) is when we postulate that 
T,j V = 0. The equations then simplify to 

Riuv = 0. (21) 

These are known as the vacuum equations. The Min- 
kowski metric (5) is a particular solution (but not the 
only one!). 

The vacuum equations can be derived formally as the 
euler-lagrange equations [III.96] corresponding to 
the so-called Hilbert Lagrangian: 

£(g) = | Ryf^gåx 0 åx l dx 2 dx 3 . 

(The expression ^/-g dx° dx 1 dx 2 dx 3 denotes the nat- 
ural volume form associated with g.) hilbert [VI.63], 
who was following closely Einstein’s struggle to formu- 
late a theory of gravity with a dynamic metric g, arrived 
at his Lagrangian (actually a more general version of 
the above yielding the coupled Einstein-Maxwell sys- 
tem) very shortly before Einstein obtained the general 
equations (20). 

Many of the most interesting phenomena that come 
from the equations (20) are already present in the vac- 
uum case (21). This is somewhat ironic, because it was 
the forms of T and (10) that dictated (20). Note, in con- 
trast, that in the Newtonian theory (2), the “vacuum” 
equations p = 0 and standard boundary conditions at 
infinity imply 4> = 0. Thus, the Newtonian theory of the 
vacuum is trivial. 

The part of the curvature tensor R pv \ p that is not 
forced to vanish from (21) is known as the Weyl cur- 
vature. This curvature measures the “tidal” distortion 
of families of geodesics. Thus, the “local strength” of 
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gravitational helds in vacuum regions is related in the 
Newtonian limit to the tidal forces on macroscopic test 
matter, not the norm of the gravitational force. 

3.5 The Manifold Concept 

We have been able to get this far without really address- 
ing the question of where the metric g is defined. In 
passing from the Minkowski metric to a general g, 
Einstein did not originally have in mind replacing the 
domain R 4 . But it is clear in the Riemannian case from 
the theory of surfaces that the natural object for a 
metric to live on is not necessarily R 2 but a general 
surface. For instance, the metric dd 2 + sin 9 d 4> 2 nat- 
urally lives on the sphere S 2 . In saying this, we are to 
understand that one requires several coordinate sys- 
tems of the type (9,4>) to cover all of S 2 . The n- 
dimensional generalization of the object where Rie- 
mannian or Lorentzian metrics naturally live is a man- 
ifold [1.3 §6.9]. Manifolds are the structures obtained 
by consistently smoothly pasting together local coordi- 
nate systems. 

Thus, general relativity allows the space-time contin- 
uum not to be R 4 but instead to be a general mani- 
fold M, which may very well be topologically inequiv- 
alent to R 4 , just as S 2 is inequivalent to R 2 . We call 
the pair (M,g) a Lorentzian manifold. Properly put, the 
unknown in the Einstein equations is not just g but the 
pair ( M,g ). 

It is interesting that this fundamental faet, namely 
that the topology of space-time is not a priori de- 
termined by the equations, arises almost as an after- 
thought. Moreover, it was a thought that took many 
years to be clarified. 

3.6 Waves, Gauges, and Hyperbolicity 

When written out explicitly in arbitrary coordinates 
(try it!), the Einstein equations do not appear to be 
of any usual type, such as elliptic (like the poisson 
equation [IV. 12 §1]), parabolle (like the heat equa- 
tion [1.3 §5.4]), or hyperbolic (like the wave equation 
[1.3 §5.4]; see [IV.12 §2.5] for more about these differ- 
ent classes of PDEs). This is related to the faet that, 
given a solution, one can form a “new” solution by com- 
posing the old solution with a coordinate transforma- 
tion. We can do this for new coordinate systems whose 
coordinate transformations differ from the identity 
only in a ball. This faet, known as the hole argument, 
confused Einstein and his mathematical collaborator 
Marcel Grossmann, who were thinking algebraically in 
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terms of the form of the equations in coordinates, and 
temporarily led them to reject “general covariance.” 
The resul ting backtracking delayed the final correct for- 
mulation of (1) by about two years. The geometric inter- 
pretation of the theory immediately suggests the reso- 
lution to the dilemma: such solutions are to be consid- 
ered “the same” because they are the same from the 
point of view of all geometric measurements. In mod- 
ern language, a solution to the Einstein vacuum equa- 
tions (say) is an equivalence class [1.2 §2.3] of space- 
times (M,g), where two space-times are equivalent if 
there exists a diffeomorphism <p between them such 
that in any open set the metric has the same coordinate 
form when one identifies local coordinates by <p. 

It turns out that once these conceptual issues are 
overcome, the Einstein equations can be viewed as 
hyperbolic. The easiest way to do this is to impose a 
gauge: that is to say, a certain restriction on the coor- 
dinate system. Specifically, one requires the coordinate 
functions x a to satisfy the wave equation a g x a = 0, 
where the d’Alembertian operator is defined by the 
formula 

Dg = -^-M^Fgg^d v ). 
v 9 

Such coordinates always exist locally and they are tradi- 
tionally called harmonic coordinates, although the term 
wave coordinates would perhaps be more appropriate. 
The Einstein equation can then be written as a system 

OgØgv = Ng V ({g a p},{dy9ap}), 
where N gv is a nonlinear expression that is quadratic 
in the d y g a p- In view of the Lorentzian signature of 
the metric, the above system constitutes what is known 
as a second-order nonlinear (but quasilinear) hyperbolic 
system. 

At this point, it is instructive to make a compari- 
son with the Maxwell equations. Suppose we are given 
an electric held E and a magnetic held B defined on 
Minkowski space. A 4-potential is a vector held A such 
that Ei = -Vi Ao - c _1 3 t Aj, and Bi = Xj,k=i £ijkdjAk. 
(Here £123 = 1, and djk is totally antisymmetric, i.e., it 
transforms to its negative under permutation of any 
two indices.) If one wishes to view A as the fundamen- 
tal physical object, then one notices that if A is replaced 
by the held Å, defined by the formula 

Å = A+ ( c' ’c((//, 3it//, 021/6 d3<p), 
where ip is an arbitrary function, then Å is also a 4- 
potential for E and B. One can expect a determined 
equation for A only if one imposes further conditions 


on it: that is, if one “fixes the gauge.” (The terminol- 
ogy “gauge” is originally due to weyl [VI.80].) In the 
so-called Lorentz gauge 

V»Ag = 0 , 

the Maxwell equations can be written 

D Ag = -c~ 2 dfAg + x d li A n = 0. 

from which the wave properties are completely man- 
ifest. The gauge-symmetric point of view lived on to 
later twentieth century glory: the Yang-Mills equations, 
which are a nonlinear generalization of the Maxwell 
equations with a similar gauge symmetry, are the cen- 
tral part of the so-called standard model for particle 
physics. 

The hyperbolicity property of the Einstein equations 
has two important repercussions. The first is that there 
should exist gravitational waves. This was noted by Ein- 
stein at least as early as 1918, essentially as a result of 
a linearized version of the considerations in the above 
discussion. The second is that there is a well-posed 
initial- value problem [IV. 12 §2.4] for the Einstein 
equations (1) with the domain-of-dependence property, 
when these are coupled with appropriate matter equa- 
tions. In particular, this is true in the vacuum case (21). 
The proper conceptual framework to formulate the lat- 
ter problem took a long time to get right, and was 
only completely understood through work of Choquet- 
Bruhat and Geroch in the 1950s and 1960s, based on 
the fundamental concept of global hyperbolicity due to 
Leray. Well-posedness means that one could associate 
a unique solution (in the vacuum case, a Lorentzian 4- 
manifold (M,g) satisfying (21)) with a suitable notion 
of initial data. Of course, “initial data” does not mean 
“data at time t = 0,” since the concept of t = 0 is 
not geometric. Instead, the data take the form of some 
Riemannian 3-manifold (E, g) with a symmetric covari- 
ant 2-tensor K. The triple ( E,g,K ) has to satisfy the 
so-called Einstein constraint equations. But with this 
notion, the fundamental problem of general relativity, 
despite its revolutionary conceptual structure, is thor- 
oughly classical: to determine the relation of the solu- 
tion to initial data, that is to say, to determine the future 
from knowledge of the “present.” This is the problem 
of dynamics. 

4 The Dynamics of General Relativity 

In this final section we give a taste of our current math- 
ematical understanding of the dynamics of the Einstein 
equations. 



IV. 13. General Relativity and the Einstein Equations 

4.1 Stability of Minkowski Space and the 
Nonlinearity of Gravitational Radiation 

In any physical theory in which one can formulate the 
problem of dynamics, the most basic question is the 
stability of the trivial solution. In other words, if we 
make a small change to the “initial conditions,” will the 
resulting change to the solution be small as well? In the 
case of general relativity, this is the question of stabil- 
ity of the Minkowski space-time R 3+1 . This fundamen- 
tal result was proven for the vacuum equations (21) in 
1993 by Christodoulou and Klainerman. 

The proof of the stability of Minkowski space made 
it possible to formulate the laws of gravitational radi- 
ation rigorously. Gravitational radiation is yet to be 
observed directly, but it has been inferred, originally 
by Hulse and Taylor, from the energy loss of a binary 
system. This work gave them the only Nobel prize 
(1993) directly associated with the Einstein equations! 
The blueprint for the mathematical formulation of the 
radiation problem is based on work of Bondi and later 
Penrose. One associates with the space-time ( M,g ) an 
ideal boundary “at infinity,” known as nuli infinity and 
denoted 2 + . Physically, the points of 2 + correspond 
to observers who are far away from the isolated self- 
gravitating system but who are receiving its signals. 
Gravitational radiation can be identified with certain 
tensors defined on 2 + from rescaled boundary limits of 
various geometric quantities. As Christodoulou was to 
discover, the laws of gravitational radiation are them- 
selves nonlinear, and the nonlinearity is potentially 
relevant for observation. 

4.2 Black Holes 

Perhaps no prediction of general relativity is better 
known today than that of black holes. 

The story of black holes begins with the so-called 
Sckwarzschild metric: 



+ r 2 (dØ 2 + sin 2 9d4> 2 )- (22) 

The parameter m here is a positive constant. This is 
a solution of the vacuum Einstein equations (21) that 
was found in 1916. The original interpretation of (22) 
was that it modeled the gravitational field in a vacuum 
region outside a star. That is to say, (22) was considered 
only in some coordinate range r > Ro, for an Ro > 2 m, 
and the metric was matched at r = Ro to a “static” inte- 
riør metric satisfying the coupled Einstein-Euler sys- 
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tem in the coordinate range r ^ Ro. (This latter metric 
is again of the form (22), but with m = m(r) such that 
m - 0 as r - 0.) 

From the theoretical point of view, a natural problem 
poses itself. Suppose we do away with the star alto- 
gether and try to consider (22) for all values of r. What 
happens then to the metric (22) at r = 2m? In the (r, t) 
coordinates, the metric element appears to be singular. 
But this turns out to be an illusion! By a simple change 
of coordinates, one can easily extend the metric reg- 
ularly as a solution of (21) beyond r = 2m. That is, 
there exists a manifold M that contains both a region 
r > 2 m and a region 0 < r < 2m, separated by a reg- 
ular (nuil) hypersurface Tf + . The metric element (22) 
is valid everywhere except on Tf + , where it must be 
rewritten in regular coordinates. 

It turns out that the hypersurface Tf + can be char- 
acterized by an exceptional global property: it defines 
the boundary of the region of space-time that can send 
signals to nuli infinity 2 + , or, in the physical interpreta- 
tion, to distant observers. In general, the set of points 
that cannot send signals to nuil infinity 1 + is known 
as the black hole region of space-time. Thus, the region 
0 < r < 2 m is the black hole region of M, and Tf + is 
known as the event horizon. 

These issues took a long time to be sorted out, 
partly because the language of global Lorentzian geom- 
etry was developed long after the original formula- 
tion of the Einstein equations. The global geometry of 
the extended space-time M was clarified by Synge in 
around 1950 and finally by Kruskal in 1960. The name 
“black hole” is due to the imaginative physicist John 
Wheeler. From their beginnings as a theoretical curios- 
ity, black holes have become part of the accepted astro- 
physical explanation for a wide variety of phenomena, 
and in particular are thought to represent the end-state 
for the gravitational collapse of many stars. 

4.3 Space-Time Singularities 

A second natural problem poses itself in relation to 
the Schwarzschild metric (22), now considered in the 
region r < 2 m of the extended space-time M: what 
happens atr = 0? 

A computation reveals that as r — ■ 0, the Kretchmann 
scalar R (J vApR AJvAp blows up. Since this expression is a 
geometric invariant, it follows that, unlike the situation 
at r = 2 m, the space-time is not regularly extendable 
beyond 0. Moreover, timelike geodesics (freely falling 
observers in the test particle approximation) entering 
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the black hole region reach r = 0 in finite proper time, 
so they are “incomplete” in the sense that they can- 
not be continued indefinitely. They thus “observe” the 
breakdown of the geometry of the space-time metric. 
Moreover, macroscopic observers approaching r = 0 
are torn apart by the gravitational “tidal forces.” 

In the early years of the subject, it was thought that 
this seemingly pathological behavior was connected to 
the high degree of symmetry of the Schwarzschild met- 
ric and that “generic” solutions would not exhibit such 
phenomena. That this is not the case was shown by 
Penrose’s celebrated incompleteness theorem of 1965. 
This States that solutions to the initial-value problem 
for the Einstein equations coupled to appropriate mat- 
ter will always contain such incomplete timelike or nuli 
geodesics if the initial data hypersurface is noncom- 
pact and contains what is known as a closed trapped 
surface. The Schwarzschild case may appear to sug- 
gest that such incomplete geodesics are associated with 
the curvature biowing up. However, the situation can in 
faet be very different, as is apparent in the celebrated 
Kerr solutions, a remarkable two-parameter family of 
solutions to the vaeuum equations (21), discovered only 
in 1963, which are rotating versions of (22). In the 
Kerr solutions, incomplete timelike geodesics meet a 
so-called Cauchy horizon, a smooth boundary of the 
region of space-time that is uniquely determined by 
initial data. 

The theorem of Penrose gives rise to two important 
conjectures. The first, known as weak cosmic censor- 
ship, says roughly that for generic physically plausi- 
ble initial data for suitable Einstein-matter systems, 
geodesic incompleteness, if it occurs, is always con- 
fined to black hole regions. The second, strong cosmic 
censorship, says roughly that for generic admissible 
initial data, incompleteness of the solution is always 
associated with a local obstruction to extendability, 
such as the blow-up of curvature. The latter conjecture 
would ensure that the unique solution of the initial- 
value problem is the only classical space-time that can 
arise from the data. That is to say, it would imply that 
classical determinism holds for the Einstein equations. 

Both conjectures are false if we drop the assumption 
that the initial data are generic, and this is one rea- 
son for their difficulty. Indeed, Christodoulou has con- 
structed spherically symmetric solutions of the cou- 
pled Einstein-scalar held system (arising from regular 
initial data) that are geodesically incomplete but do not 
contain black hole regions. Such space-times are said to 
contain naked singularities. 


Naked singularities are easy to construct if one does 
not require that they arise from the collapse of regu- 
lar initial data. An example is the Schwarzschild metric 
(22) for m < 0. This metric, however, does not admit a 
complete asymptotically flat Cauchy hypersurface. This 
faet is related to the celebrated positive energy theorem 
of Schoen and Yau. 

4.4 Cosmology 

The space-times ( M,g ) discussed previously are all 
idealizedrepresentations of isolated systems. The “rest 
of the universe” is excised and replaced by an “asymp- 
totically flat end”; far-away observers are placed at an 
ideal boundary “at infmity.” But what if we are more 
ambitious and consider our space-time (M,g) as rep- 
resenting the whole universe? The study of this latter 
problem is known as cosmology. 

Observations suggest that on very large scales the 
universe is approximately homogeneous and isotropic. 
This is sometimes known as the Copemican principle. 
Interestingly, one cannot solve the Poisson equation (2) 
with a constant Vø and constant nonzero p onl 4 . 
Thus, in Newtonian physics, cosmology never became a 
rational science. 1 General relativity, on the other hånd, 
does admit homogeneous and isotropic solutions as 
well as their perturbations. Indeed, cosmological solu- 
tions of the Einstein equations were studied by Einstein 
himself, de Sitter, Friedmann, and Lemaitre in the early 
years of the subject. 

When general relativity was formulated, the prevail- 
ing view was that the universe should be static. This 
led Einstein to add a term Ag to the left-hand side 
of his equations, fine-tuned so as to allow for such a 
solution. The constant A is known as the cosmolog- 
ical constant. The expansion of the universe is now 
considered to be an observational faet, beginning with 
the fundamental discoveries of Hubble. Expanding uni- 
verses can be modeled to a first approximation by so- 
called Friedmann-Lemaitre solutions to the Einstein- 
Euler system, with various values of A. In the past direc- 
tion, these solutions are singular: this singular behavior 
is often given the suggestive name “the big bang.” 

4.5 Future Developments 

The plethora of exact solutions of the Einstein equa- 
tions gives us a taste of what the qualitative behavior 


1. One can study “Newtonian cosmology” by modifying the foun- 
dations of the Newtonian theory so as to describe the theory with a 
nonmetric connection on, say, T 3 xr. But this step is of course inspired 
by general relativity (see section 3.5). 



IV. 14. Dynamics 


179 


of more general solutions may be. But a true qualita- 
tive understanding of the nature of general solutions 
has been achieved only in a neighborhood of the very 
simplest solutions. The question of the stability of the 
black hole solutions described above remains unan- 
swered, as do the cosmic censorship conjectures and 
the nature of the singularities that occur generically in 
general relativity. Yet these questions are fundamental 
to the physical interpretation of the theory, and indeed 
to assessing its very validity. 

How likely is it that these questions can ever be 
answered by rigorous mathematics? Problems con- 
cerning the singular behavior of nonlinear hyperbolic 
partial differential equations are notoriously difficult. 
The rich geometric structure of the Einstein equations 
appears at first as a formidable additional complica- 
tion, but it may also turn out to be a blessing. One can 
only hope that the Einstein equations will continue to 
reveal beautiful mathematical structure that answers 
fundamental questions about our physical world. 
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IV. 14 Dynamics 

Bodil Branner 


1 Introduction 

Dynamical systems are used to describe the way sys- 
tems evolve in time, and have their origin in the laws 
of nature that newton [VI.14] formulated in Prin- 
cipia Mathematica (1687). The associated mathemati- 
cal discipline, the theory of dynamics, is closely related 
to many parts of mathematics, in particular analy- 
sis, topology, measure theory, and combinatorics. It 
is also highly influenced and stimulated by problems 


from the natural Sciences, such as celestial mechan- 
ics, hydrodynamics, statistical mechanics, meteorol- 
ogy, and other parts of mathematical physics, as well 
as reaction chemistry, population dynamics, and eco- 
nomics. 

Computer simulations and visualizations play an im- 
portant role in the development of the theory; they have 
changed our views about what should be considered 
typical, rather than special and atypical. 

There are two main branches of dynamical systems: 
continuous and discrete. The main focus of this paper 
will be holomorphic dynamics, which concerns dis- 
crete dynamical systems of a special kind. These sys- 
tems are obtained by taking a holomorphic function 
[1.3 §5.6] / defined on the complex numbers and apply- 
ing it repeatedly. An important example is when / is a 
quadratic polynomial. 


1.1 Two Basic Examples 

It is interesting to note that both types of dynamical 
system, continuous and discrete, canbe well illustrated 
by examples that date back to Newton. 


(i) The N-body problem models the motion in the 
solar system of the sun and N — 1 planets, and does so 
in terms of differential equations. Each body is repre- 
sented by a single point, namely its center of mass, and 
the motion is determined by Newton's universal law 
of gravitation — also called the inverse square law. This 
says that the gravitational force between two bodies is 
proportional to each of their masses and inversely pro- 
portional to the square of the distance between them. 
Let ri denote the position vector of the ith body, m, its 
mass, and g the universal gravitational constant. Then 
the force on the ith body due to the / th has magnitude 
gmimjlWrj - rdl 2 , and its direction is along the line 
from ri to rj. We can work out the total force on the ith 
body by adding up all these forces for j f i. Since a unit 
vector in the direction from r, to r, is (vj-ri) /\\rj-ri\\, 
we obtain a force of 


(There is a cube on the bottom rather than a square 
in order to compensate for the magnitude of rj - r t .) 
A solution to the IV-body problem is a set of differen- 
tiable vector functions (rj ( t ) , . . . , ( t ) ) , depending on 
time t, that satisfy the N differential equations 


miv” (t ) = ^ X m i m j 


rj(t)-r<(t) 

\\rj(t)-n(t)p’ 
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which result from Newton’s second law, which States 
that force = mass x acceleration. 

Newton was able to solve the two-body problem ex- 
plicitly. By neglecting the influence of other planets, 
he derived the laws formulated by Johannes Kepler, 
which describe how each planet moves in an elliptic 
orbit around the sun. However, the jump to N > 2 
makes an enormous difference to the complication of 
the problem: except in very special cases, the system 
of equations can no longer be solved explicitly (see 
the three-body problem [V.36]). Nevertheless, New- 
ton’s equations are of great practical importance when 
it comes to guiding satellites and other space missions. 

(ii) newton’s method [II.4 §2.3] for solving equa- 
tions is quite different and does not involve differen- 
tial equations. We consider a differentiable function f 
of one real variable and wish to determine a zero of /, 
that is, a solution to the equation fix) = 0. Newton’s 
idea was to define a new function: 



To put this more geometrically, N fix) is the x-coordi- 
nate of the point where the tangent line to the graph 
y = /(x) at the point (x,/(x)) crosses the x-axis. 
(If f (x) = 0, then this tangent line is horizontal and 
Nf(x) is not defined.) 

Under many circumstances, if x is close to a zero 
of /, then Nf(x ) is significantly doser. Therefore, if 
we start with some value Xo and form the sequence 
obtained by repeated application of Nf, that is, the 
sequence xo,xi,X2,..., where xi = Nf(x o), X2 = 
Nf(xi), and so on, we can expect that this sequence 
will converge to a zero of /. And this is true: if the 
initial value xo is sufficiently close to a zero, then the 
sequence does indeed converge toward that zero, and 
does so extremely quickly, basically doubling the num- 
ber of correct digits in each step. This rapid conver- 
gence makes Newton’s method very useful for numeri- 
cal computations. 

1.2 Continuous Dynamical Systems 

We can think of a continuous dynamical system as a sys- 
tem of first-order differential equations, which deter- 
mine how the system evolves in time. A solution is 
called an orbit or trajectory, and is parametrized by 
a number t, which one usually thinks of as time, that 
takes real values and varies continuously: hence the 
name “continuous” dynamical system. A periodic orbit 


of period T is a solution that repeats itself after time T, 
but not earlier. 

The differential equation x"(t) = -x(t) is of sec- 
ond order, but it is nevertheless a continuous dynam- 
ical system because it is equivalent to the system of 
two first-order differential equations x[(t) = X2 (t) and 
x' 2 (t) = -xi(t). In a similar way, the system of differ- 
ential equations of the N-body problem can be brought 
into standard form by introducing new variables. The 
equations are equivalent to a system of 6 N first-order 
differential equations in the variables of the position 
vectors r,- = (xp , X(2, X;3 ) and the velocity vectors 
r[ = iya , Ti2, Ti3 )■ Thus, the N-body problem is a good 
example of a continuous dynamical system. 

In general, if we have a dynamical system consisting 
of n equations, then we can write the ith equation in 
the form 

x' t (t) = fi(Xiit ), . . . , x n (t)), 
or alternatively we can write all the equations at once 
in the form x'(t) = /(x(t)), where x(t) is the vec- 
tor (xi(t),...,x n (t)) and/ = (/i, ...,/«) isafunction 
from R n to R™. Note that / is assumed not to depend 
on t. If it does, then the system can be brought into 
standard form by adding the variable x ra +i = t and the 
differential equation x’ n+i (t) = 1, which increases the 
dimension of the system from n to n + 1. 

The simplest systems are linear ones, where / is 
a linear map: that is, fix) is given by Ax for some 
constant nxn matrix A. The system above, x\ (t) = 
X2 it) and x 2 it) = -xiit), is an example of a linear 
system. Most systems, however, including the one for 
the N-body problem, are nonlinear. If the function / 
is “nice” (for instance, differentiable), then uniqueness 
and existence of solutions are guaranteed for any ini- 
tial point Xo- That is, there is exactly one solution that 
passes through the point xo at time t = 0. For example, 
in the N-body problem there is exactly one solution for 
any given set of initial position vectors and initial veloc- 
ity vectors. It also follows from uniqueness that any 
pair of orbits must either coincide or be totally disjoint. 
(Bear in mind that the word “orbit” in this context does 
not mean the set of positions of a single point mass, 
but rather the evolution of the vector that represents 
all the positions and velocities of all the masses.) 

Although it is seldom possible to express solutions to 
nonlinear systems explicitly, we know that they exist, 
and we call the dynamical system deterministic since 
solutions are completely determined by their initial 
conditions. For a given system and given initial con- 
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ditions it is therefore theoretically possible to predict 
its entire future evolution. 

1.3 Discrete Dynamical Systems 

A discrete dynamical system is a system that evolves 
in jumps: “time,” in such a system, is best repre- 
sented by an integer rather than a real number. A good 
example is Newton’s method for solving equations. In 
this instance, the sequence of points we saw earlier, 
Xo,xi,...,Xk,..., where Xk = Nf(Xk-i), is called the 
orbit of X(). We say that it is obtained by iteration of 
the function Nf, i.e., by repeated application of the 
function. 

This idea can easily be generalized to other map- 
pings F : X — X, where X could be the real axis, 
an interval in the real axis, the plane, a subset of the 
plane, or some more complicated space. The impor- 
tant thing is that the output F(x) of any input x can 
be used as the next input. This guarantees that the 
orbit of any xo in X is defined for all future times. 
That is, we can define a sequence, xo,Xi ,...,Xk,---, 
where Xk = F(Xk-i) for every fc. If the function F 
has an inverse F _1 , then we can iterate both forwards 
and backwards and obtain the full orbit of xø as the 
bi-infinite sequence . . . ,X- 2 ,X-i,Xo,Xi,X 2 , . . . , where 
Xk = F(Xk-i) and, equivalently, Xk-i = F _1 (Xfc), for all 
integer values. 

The orbit of Xo is periodic of period k if it repeats 
itself after time k, but not earlier, i.e., if Xk = xo, but 
Xj * xo for j = 1, . . . , k - 1. The orbit is called pre- 
periodic if it is eventually periodic, in other words if 
there exist I ^ 1 and k ^ 1 such that xg is periodic 
of period k, but none of the xj for 0 < j < F are peri- 
odic. The notion of pre-periodicity has no counterpart 
in continuous dynamics. 

A discrete dynamical system is deterministic, since 
the orbit of any given initial point Xo is completely 
determined once you know xo- 

1.4 Stability 

The modern theory of dynamics was greatly influenced 
by the work of poincaré [VI.61], and in particular 
by his prize-winning memoir on the 3-body problem, 
succeeded by three more elaborate volumes on celes- 
tial mechanics, all from the late nineteenth century. 
The memoir was written in response to a competition 
where one of the proposed problems concerned sta- 
bility of the solar system. Poincaré introduced the so- 
called restricted 3-body problem, where the third body 


is assumed to have an infmitely small mass: it does not 
influence the motion of the other two bodies but it is 
influenced by them. Poincaré’s work became the pre- 
lude to topological dynamics, which focuses on topolog- 
ical properties of solutions to dynamical systems and 
takes a qualitative approach to them. 

Of special interest is the long-term behavior of a sys- 
tem. A periodic orbit is called stable if all orbits through 
points sufficiently close to it stay close to it at all future 
times. It is called asymptotically stable if all sufficiently 
close orbits approach it as time tends to infinity. Let 
us illustrate this by two linear examples in discrete 
dynamics. For the real function F (x) = -x, all points 
have a periodic orbit: 0 has period 1 and all other x 
have period 2. Every orbit is stable, but none is asymp- 
totically stable. The real function G(x) = \x has only 
one periodic orbit, namely 0. Since G(0) = 0, this orbit 
has period 1, and we call it a fixed point. If you take 
any number and repeatedly divide it 2, then the result- 
ing sequence will approach 0, so the fixed point 0 is 
asymptotically stable. 

One of the methods introduced by Poincaré during 
his study of the 3-body problem was a reduction from 
a continuous dynamical system, in dimension n, say, 
to an associated discrete dynamical system, a mapping 
in dimension n - 1. The idea is as follows. Suppose we 
have a periodic orbit of period T > 0 in some continu- 
ous system. Choose a point Xo on the orbit and a hyper- 
surface Z through xo, for instance part of a hyperplane, 
such that the orbit cuts through il at xo - For any point in 
S that is sufficiently close to Xo, one can follow its orbit 
around and see where it next intersects Z. This defines 
a transformation, known as the Poincaré map, which 
takes the original point to the next point of intersection 
of its orbit with Z. It follows from the faet that dynam- 
ical systems have unique solutions that every Poincaré 
map is injective in the neighborhood of Xo (within Z) 
for which the Poincaré map is defined. One can perform 
both forwards and backwards iterations. Note that the 
periodic orbit of Xo in the continuous system is sta- 
ble (respectively, asymptotically stable) exaetly when 
the fixed point Xo of the Poincaré map in the discrete 
system is stable (respectively, asymptotically stable). 

1.5 Chaotic Behavior 

The notion of chaotic dynamics arose in the 1970s. It 
has been used in different settings, and there is no sin- 
gle definition that covers all uses of the term. However, 
the property that best characterizes chaos is the phe- 
nomenon of sensitive dependence on initial conditions. 
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Poincaré was the first to observe sensitivity to initial 
conditions in his treatment of the 3-body problem. 

Instead of describing his observations let us look 
at a much simpler example from discrete dynamics. 
Take as a dynamical space X the half-open unit inter- 
val [0,1), and let F be the function that doubles a num- 
ber and reduces it modulo 1. That is, F(x ) = 2x when 
0 < x < | and F(x) = 2x - 1 when ^ < x < 1. Let Xo 
be a number in X and let its iterates be xi = F(x o), 
X2 = F(xj ), and so on. Then xj t is the fractional part of 
2 k xo. (The fractional part of a real number t is what you 
get when you subtract the largest integer less than t.) 

A good way to understand the behavior of the se- 
quence xo,xi,X2,... of iterates is to consider the 
binary expansion of Xo- Suppose, for example, that this 
begins 0.110100010100111.... To double a number 
when it is written in binary, all you have to do is shift 
every digit to the left (just as one does in the deci- 
mal system when multiplying by 10). So 2xo will have 
a binary expansion that begins 1.10100010100111 .... 
To obtain F(xo), we have to take the fractional part 
of this, which we do by subtracting the initial 1. This 
gives us xi = 0.10100010100111.... Repeating the 
process we find that X2 = 0.0100010100111 . . . , X3 = 
0.100010100111 . . . , and so on. (Notice that when we 
calculated X3 from X2 there was no need to subtract 1, 
since the first digit after the “decimal point” was a 0.) 
Now consider a different choice of initial number, x' 0 = 
0.110100010110110.... The first nine digits after the 
decimal point are the same as the first nine digits of 
xo, so Xq is very close to Xo- However, if we apply F 
nine times to Xo and x' 0 , then their respective tenth 
digits have shifted leftwards and become the first dig- 
its of xg = 0.00111 . . . andxg = 0.10110 These two 

numbers differ by almost \ , so they are not at all close. 

In general, if we know Xo to an accuracy of k binary 
digits and no more, then after k iterations of the map F 
we have lost all information: x^ could lie anywhere in 
the interval [0, 1). Therefore, even though the system is 
deterministic, it is impossible to predict its long-term 
behavior without knowing xo with perfect accuracy. 

This is true in general: it is impossible to make long- 
term predictions in any part of a dynamical system that 
shows sensitivity to initial conditions unless the initial 
conditions are known exactly. In practical applications 
this is never the case. For instance, when applying a 
mathematical model to perform weather forecasts, one 
does not know the initial conditions exactly, and this is 
why reliable long-term forecasting is impossible. 


Sensitivity is also important in the notion of so-called 
strnnge attractors. A set A is called an attractor if all 
orbits that start in A stay in A and if all orbits through 
nearby points get doser and doser to A. In continuous 
systems, some simple sets that can be attractors are 
equilibrium points, periodic orbits (limit cycles), and 
surfaces such as a torus. In contrast to these examples, 
strange attractors have both complicated geometry and 
complicated dynamics: the geometry is fractal and the 
dynamics sensitive. We shall see examples of fractals 
later on. 

The best-known strange attractor is the Lorenz at- 
tractor. In the early 1960s, the meteorologist Edward N. 
Lorenz studied a three-dimensional continuous dynam- 
ical system that gave a simplified model of heat flow. 
While doing so, he noticed that if he restarted his com- 
puter with its initial conditions chosen as the output 
of an earlier calculation, then the trajectory started to 
diverge from the one he had previously observed. The 
explanation he found was that the computer used more 
precision in its internal calculations than it showed 
in its output. For this reason, it was not immediately 
apparent that the initial conditions were in faet very 
slightly different from before. Because the system was 
sensitive, this tiny difference eventually made a much 
bigger difference. He coined the poetic phrase “the but- 
terfly effeet” to describe this phenomenon, suggesting 
that a small disturbance such as a butterfly flickering its 
wings could in time have a dramatic effeet on the long- 
term evolution of the weather and trigger a tornado 
thousands of miles away. Computer simulations of the 
Lorenz system indicate that solutions are attracted to 
a complicated set that “looks like” a strange attractor. 
The question of whether it actually was one remained 
open for a long time. It is not obvious how trustwor- 
thy computer simulations are when one is studying 
sensitive systems, since the computer rounds off the 
numbers in each step. In 1998 Warwick Tucker gave 
a computer-assisted proof that the Lorenz attractor 
is in faet a strange attractor. He used interval arith- 
metic, where numbers are represented by intervals and 
estimates can be made precise. 

For topological reasons, sensitivity to initial condi- 
tions is possible for continuous dynamical systems 
only when the dimension is at least 3. For discrete 
systems where the map F is injective, the dimension 
must be at least 2. However, for noninjective mappings, 
sensitivity can occur for one-dimensional systems, as 
we saw with the example given earlier. This is one of 
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the reasons that discrete one-dimensional dynamical 
systems have been intensively studied. 

1.6 Structural Stability 

Two dynamical systems are said to be topologically 
equivalent if there is a homeomorphism (a continuous 
map with continuous inverse) that maps the orbits of 
one system onto the orbits of the other, and vice versa. 
Roughly speaking, this means that there is a continu- 
ous change of variables that turns one system into the 
other. 

As an example, consider the discrete dynamical sys- 
tem given by the real quadratic polynomial F(x) = 
4x(l - x). Suppose we were to make the substitution 
y = —4x + 2. How could we describe the system in 
terms of y? Well, if we apply F, then we change x to 
4x(l - x), which means thaty = -4x+2 changes F(x) 
to — 4F(x) + 2 = -16x(l - x) + 2. But 

-16x(l - x) + 2 = 16x 2 - 16x + 2 
= (-4x + 2) 2 -2 
= y 2 — 2. 

Therefore, the effect of applying the polynomial func- 
tion F to x is to apply a different polynomial function 
to y, namely O(y) = y 2 - 2. Since the change of vari- 
ables from x to -4x + 2 is continuous and invertible, 
one says that the functions F and Q are conjugate. 

Because F and Q are conjugate, the orbit of any 
xo under F becomes, after the change of variables, 
the orbit of the corresponding point yo = -4xo + 2 
under Q. That is, for every k we have yu = -4x^+2. The 
two systems are topologically equivalent: if you want to 
understand the dynamics of one of them, you can if you 
study the other, since its dynamics will be qualitatively 
the same. 

For continuous dynamical systems the notion of 
equivalence is slightly looser in that we allow a homeo- 
morphism between two topologically equivalent sys- 
tems to map one orbit onto another without respect- 
ing the exact time evolution, but for discrete dynami- 
cal systems we must demand that the time evolution is 
respected as in the example above: in other words, we 
insist on conjugacy. 

The term dynamical system was coined by Stephen 
Smale in the 1960s and has taken off since then. Smale 
evolved the theory of robust systems, also named struc- 
turally stable systems, a notion that was introduced in 
the 1930s by Alexander A. Andronov and Lev S. Pon- 
tryagin. A dynamical system is called structurally sta- 
ble if all systems sufficiently close to it, belonging to 


some specified family of systems, are in faet topolog- 
ically equivalent to it. We say that they all have the 
same qualitative behavior. An example of the kind of 
family one might consider is the set of all real quad- 
ratic polynomials of the form x 2 + a. This family is 
parametrized by a, and the systems close to a given 
polynomial x 2 + ao are all the polynomials x 2 + a for 
which a is close to uq. We shall return to the question 
of structural stability when we discuss holomorphic 
dynamics later. 

If a family of dynamical systems parametrized by a 
variable a is not structurally stable, it may still be that 
the system with parameter ao is topologically equiv- 
alent to all systems with parameter a in some region 
that contains ao. A major goal of research into dynam- 
ics is to understand not just the qualitative structure 
of each system in the family, but also the structure of 
the parameter space, that is, how it is divided up into 
such regions of stability. The boundaries that separate 
these regions form what is called the bifurcation set: 
if ao belongs to this set, then there will be parameters 
a arbitrarily close to ao for which the corresponding 
system has a different qualitative behavior. 

A description and classification of structurally stable 
systems and a classification of possible bifurcations is 
not within reach for general dynamical systems. How- 
ever, one of the success stories in the subject, holomor- 
phic dynamics, studies a special class of dynamical sys- 
tems for which many of these goals have been attained. 
It is time to turn our attention to this class. 

2 Holomorphic Dynamics 

Holomorphic dynamics is the study of discrete dynam- 
ical systems where the map to be iterated is a holo- 
morphic function [1.3 §5.6] of the complex numbers 
[1.3 §1.5]. Complex numbers are typically denoted by z. 
In this article, we shall consider iterations of complex 
polynomials and rational functions (that is, functions 
like (z 2 + 1 ) / (z 3 + 1 ) that are ratios of polynomials), but 
mueh of what we shall say about them is true for more 
general holomorphic functions, such as exponential 
[III.25] and trigonometric [III.94] functions. 

Whenever one restricts attention to a special kind 
of dynamical system, there will be tools that are spe- 
cially adapted to that situation. In holomorphic dynam- 
ics these tools come from complex analysis. When we 
concentrate on rational functions, there are more spe- 
cial tools, and if we restrict further to polynomials, then 
there are yet others, as we shall see. 



184 


IV. Branches of Mathematics 


Why might one be interested in iterating rational 
functions? One answer arose in 1879, when cayley 
[VI.46] had the idea of trying to find roots of complex 
polynomials by extending Newton’s method, which we 
discussed in the introduction, from real numbers to 
complex numbers. Given any polynomial P, the corre- 
sponding Newton function Np is a rational function, 
given by the formula 

„ , , P(Z) zP'(z)-P(z) 

F(z) ■ 

To apply Newton’s method, one iterates this rational 
function. 

The study of the iteration of rational functions 
flourished at the beginning of the twentieth century, 
thanks in particular to work of Pierre Fatou and Gas- 
ton Julia (who independently obtained many of the 
same results). Part of their work concerned the study 
of the local behavior of functions in the neighborhoods 
of a fixed point. But they were also concerned about 
global dynamical properties and were inspired by the 
theory of so-called normal families, then recently estab- 
lished by Paul Montel. However, research on holomor- 
phic dynamics almost came to a stop around 1930, 
because the fractal sets that lay behind the results were 
so complicated as to be almost beyond imagination. 
The research came back to life in around 1980 with the 
vastly extended calculating powers of computers, and 
in particular the possibility of making sophisticated 
graphic visualizations of these fractal sets. Since then, 
holomorphic dynamics has attracted a lot of atten- 
tion. New techniques continue to be developed and 
introduced. 

To set the scene, let us start by looking at one of the 
simplest of polynomials, namely z 2 . 

2.1 The Quadratic Polynomial z 2 

The dynamics of the simplest quadratic polynomial, 
Qo(z) = z 2 , plays a fundamental role in the under- 
standing of the dynamics of any quadratic polyno- 
mial. Moreover, the dynamical behavior of Qo can be 
analyzed and understood completely. 

If z = re'°, then z 2 = r 2 e 2l °, so squaring a complex 
number squares its modulus and doubles its argument. 
Therefore, the unit circle (the set of complex numbers 
of modulus 1) is mapped by Qo to itself, while a circle 
of radius r < 1 is mapped onto a circle doser to the 
origin, and a circle of radius r > 1 is mapped onto a 
circle farther away. 


Let us look more closely at what happens to the 
unit circle. A typical point in the circle, e'°, can be 
parametrized by its argument 0, which we can take to 
lie in the interval [0, 2 tt) . When we square this number, 
we obtain e 2i0 , which is parametrized by the number 
20 if 29 < 2tt, but if 20 ^ 2 tt, then we subtract 2 tt so 
that the argument, 20 - 2tt, still lies in [0, 2 tt). This is 
strongly reminiscent of the dynamical system we con- 
sidered in section 1.5. In faet, if we replace the argu- 
ment 0 by its modified argument 0/2 tt, which amounts 
to writing e 2m0 instead of e 10 , then it becomes exaetly 
the same system. Therefore, the behavior of z 2 on the 
unit circle is chaotic. 

As for the rest of the complex plane, the origin is an 
asymptotically stable fixed point, Qo(0) = 0. For any 
point zo inside the unit circle the iterates zjt converge 
to 0 as k tends to infmity. For any point zo outside the 
unit circle the distance | zjt | between the iterates Zk and 
the origin tends to infmity as k tends to infmity. The 
set of initial points zo with bounded orbit is equal to 
the closed unit disk, i.e., all points for which |zol < 1. 
Its boundary, the unit circle, divides the complex plane 
into two domains with qualitatively different dynamical 
behavior. 

Some orbits of Qo are periodic. In order to deter- 
mine which ones, we first notice that the only possi- 
bility outside the unit circle is the fixed point at the ori- 
gin, since all other points, when you repeatedly square 
them, either get steadily doser and doser to the ori- 
gin, or get steadily farther and farther away. So now 
let us look at the unit circle, and consider the point 
e 2TTi °o, with modified argument do- If this point is peri- 
odic with period k, we must have 2 k Øo = do (modi): 
that is, (2 k - l)do must be an integer. Because of this, 
it is convenient to parametrize a point on the unit circle 
by its modified argument. From now on, when we say 
“the point d,” we shall mean the point e 2TT1 °, and when 
we say “argument” we shall mean modified argument. 

We have just established that the point d is peri- 
odic with period k only if (2 k - l)d is an integer. It 
follows that there is one point of period 1, namely 
do = 0. There are two points of period 2, forming 
one orbit, namely | «• §}é* j. There are six points for 
period 3, forming two orbits, namely \ •- | •— | •— \ 
and f -* f — f — f. (At each stage, we double the num- 
ber we have, and subtract 1 if that is needed to get us 
back into the interval [0, 1).) The points of period 4 
are fractions with denominator 15, but the converse 
is not true: the fractions ^ = 5 and = § have the 
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lower period 2. The periodic points on the unit cir- 
cle are dense in the unit circle, meaning that arbitrar- 
ily close to any point is a periodic point. This follows 
from the observation that all repeating binary expan- 
sions, such as 0.1100011000110001100011000 ... are 
periodic, and any finite sequence of Os and ls is the 
start of a repeating sequence. One can, in faet, show 
that the periodic points on the unit circle are exaetly 
the points whose argument is a fraction p/q in [0, 1) 
with q odd. Any fraction with even denominator can be 
written in the form p/(2 i q) for some odd number q. 
After I iterations, such a fraction will land on a peri- 
odic point, so the initial point is pre-periodic. Points 
with rational argument in [0, 1 ) have a finite orbit, while 
points with irrational argument have an infinite orbit. 
The reason for taking modified arguments is now justi- 
fied: the behavior of the dynamics depends on whether 
Øo is rational or irrational. 

When Øo is irrational its orbit may or may not be 
dense in [0, 1). This is another faet that is easy to see 
if one considers binary expansions. For distance, a very 
special example of a Øo with a dense orbit is given by 
the binary expansion 

Øo = 0.0100011011000001010011100101110111 
where one obtains this expansion by simply listing all 
finite binary sequences in turn: fir st the blocks of length 
one, 0 and 1, then the blocks of length two, 00, 01, 10, 
and 11, and so on. When we iterate, this binary expan- 
sion shifts to the left and all possible finite sequences 
appear at some time or another at the beginning of 
some iterate 0 k. 

2.2 Characterization of Periodic Points 

Let zq be a fixed point of a holomorphic map F. How 
do the iterates of points near zo behave? The answer 
depends crucially on a number p, called the multiplier 
of the fixed point, whichis defined to be F'(zo). To see 
why this is relevant, notice that if z is very close to zo, 
then F (z) is, to a first-order approximation, equal to 
F(z 0 ) + F'(zo)(z - Zo) = Zo + p(z - Zo). Thus, when 
you apply F to a point near zo, its difference from zo 
approximately multiplies by p. If \p\ < 1, then nearby 
points will get doser to zo, in which case zo is called 
an attracting fixed point. If p = 0, then this happens 
very quickly and Zo is called super-attracting. If | p \ > 1, 
then nearby points get farther away and zo is called 
repelling. Finally, if \p\ = 1, then one says that Zo is 
indifferent. 


If zo is indifferent, then its multiplier will take the 
form p = e 2nW , and near zo the map F will be approx- 
imately a rotation about zo by an angle of 2nØ. The 
behavior of the system depends very mueh on the pre- 
cise value of 0. We call the fixed point rationally or ir ra- 
tionally indifferent if 0 is rational or irrational, respec- 
tively. The dynamics is not yet completely understood 
in all irrational cases. 

A periodic point zo of period k will be a fixed point of 
the kth iterate F k = F ° ■ ■ ■ ° F of F. For this reason we 
define its multiplier by p = (F k )'(zo). It follows from 
the chain rule that 

(F k )'(z 0 ) = \[ F’(zj) 
i = o 

and therefore that the derivative of F k is the same at all 
points of the periodic orbit. This formula also implies 
that a super-attracting periodic orbit must contain a 
critical point (that is, a point where the derivative of F 
is zero): if (F k )'(zo) = 0, then at least one F'(zj) must 
be 0. 

Note that 0 is a super-attracting fixed point of Qo, 
and that any periodic orbit of Qo of period k on the 
unit circle has multiplier 2 k . All periodic orbits on the 
unit circle are therefore repelling. 

2.3 A One-Parameter Family of Quadratic 
Polynomials 

The quadratic polynomial Qo sits at the center of the 
one-parameter family of quadratic polynomials of the 
form Q c (z) = z 2 + c. (We considered this family ear- 
lier, but then z and c were real rather than complex.) 
For each fixed complex number c we are interested in 
the dynamics of the polynomial Q c under iteration. The 
reason we do not need to study more general quadratic 
polynomials is that they can be brought into this form 
by a simple substitution w = az + b, similar to the 
substitution in the real example in section 1.6. For any 
given quadratic polynomial P we can find exaetly one 
substitution w = az + b and one c such that 
a(P(z)) + b = (az + b) 2 + c for all z. 
Therefore, if we understand the dynamics of the poly- 
nomials Qc, then we understand the dynamics of all 
quadratic polynomials. 

There are other representative families of quadratic 
polynomials that can be useful. One example is the 
family Fa(z) = Az + z 2 . The substitution w = z + |A 
changes F\ into Q c , where c = \ A - ^A 2 . We shall 
return to the expression of c in terms of A later on. In 
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the family of polynomials Q c , the parameter c = Q c (0) 
coincides with the only critical value of Q c in the plane: 
as we shall see later, critical orbits play an essential 
role in the analysis of the global dynamics. In the fam- 
ily of polynomials F\ the parameter A is equal to the 
multiplier of the fixed point at the origin of F\, which 
sometimes makes this family more convenient. 

2.4 The Riemann Sphere 

To understand further the dynamics of polynomials 
it is hest to regard them as a special case of rational 
functions. Since a rational function can sometimes be 
infinite, the natural space to consider is not the com- 
plex plane C but the extended complex plane, which is 
the complex plane together with the point “oo.” This 
space is denoted C = C u { oo } . A geometrical picture 
(see figure 1) is obtained by identifying the extended 
complex plane with the Riemann sphere. This is sim- 
ply the unit sphere {{xi,X2,x 3 ) : x\ + x\ + x\ = 1} 
in three-dimensional space. Given a number z in the 
complex plane, the straight line joining z to the north 
pole N = (0, 0, 1) intersects this sphere in exactly one 
place (apart from N itself). This place is the point in 
the sphere that is associated with z. Notice that the 
bigger |z| is, the doser the associated point is to N. We 
therefore regard N as corresponding to the point oo. 

Let us now think of Qo(z) = z 2 as a function from 
C to C. We have seen that 0 is a super-attracting fixed 
point of Qo- What about oo, which is a fixed point as 
well? The classification we gave in terms of multipliers 
does not work at oo, but a standard trick in this situa- 
tion is to “move” oo to 0. If one wishes to understand the 
behavior of a function f with a fixed point at oo , one can 
look instead at the function g{z) = l//(l/z), which 
has a fixed point at 0 (since l//(l/0) = l//(oo) = 



Figure 2 The Douady rabbit. The filled Julia set of Q Co 
where c o is the one root of the polynomial (c 2 + c) 2 + c 
that has positive imaginary part. This corresponds to one 
of the three possible c values for which the critical orbit 
0 — c >-* c 2 + c i- (c 2 + c) 2 + c = 0 is periodic of period 3. 
The critical orbit is marked with three white dots inside the 
filled Julia set: 0 in the black, co in the light gray, and Cq + co 
in the gray. The corresponding three attracting basins of 
Q_l tl are marked in black, light gray, and gray, respectively. 
The Julia set is the common boundary of the black, light 
gray, and gray basins of attraction as well as of A Co (oo). 

l/oo = 0). When /(z) = z 2 , g(z) is also z 2 , so oo is 
also a super-attracting fixed point of Qo- 

In general, if P is any nonconstant polynomial, then 
it is natural to define P(oo) to be oo. Applying the above 
trick, we obtain a rational function. For example, if 
P(z) = z 2 + 1, then l/P(l/z) = z 2 /(z 2 + 1). If P has 
degree at least 2, then oo is a super-attracting fixed 
point. 

The connection between C and rational functions is 
expressed by the following faet: a function P : C — C 
is holomorphic everywhere (with suitable definitions 
at oo) if and only if it is a rational function. This is 
not obvious, but is typically proved in a first course 
in complex analysis. Among the rational fimetions, the 
polynomials are the ones for which F( oo) = oo = 

P _1 (oo). 

A polynomial P of degree d has d-l critical points 
in the plane (not including oo). These are the roots of 
the derivative P', counted with multiplicity. The critical 
point at oo has multiplicity d- 1, as can again be seen 
by looking at the map 1 /P ( 1 / z) . In particular, quadratic 
polynomials have exactly one critical point in the plane. 
The degree of a rational function P/Q (where P and Q 
have no common roots) is defined to be the maximal 
degree of the polynomials P and Q. A rational function 
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of degree d has 2d - 2 critical points in c, as we have 
just seen for polynomials. 

2.5 Julia Sets of Polynomials 

It can be shown that the only invertible holomorphic 
maps from € to C are polynomials of degree 1, that is, 
functions of the form az + b with a* 0. The dynamical 
behavior of these maps is easy to analyze, simple, and 
hence not interesting. 

From now on, therefore, we shall consider only poly- 
nomials P of degree at least 2. For all such polyno- 
mials, oo is a super-attracting fixed point, from which 
it follows that the plane is split into two disjoint sets 
with qualitatively different dynamics, one consisting of 
points that are attracted to oo and the other consist- 
ing of points that are not. The attracting basin of oo, 
denoted by Ap( oo), consists of all initial points z such 
that P k (z) - oo as k — oo. (Here, P k (z) stands for the 
result of applying P to z k times.) The complement of 
Ap(oo) is called the filled Julia set, and is denoted by Kp. 
It can be defined as the set of all points z such that the 
sequence z,P(z),P 2 (z),P 3 (z), .. . isbounded. (Itisnot 
hard to show that sequences of this kind either tend to 
oo or are bounded.) 

The attracting basin of oo is an open set and the 
filled Julia set is a closed, bounded set (i.e., a com- 
pact set [III.9]). The attracting basin of oo is always 
connected. For this reason the boundary of Kp is equal 
to the boundary of Ap( oo). The common boundary is 
called the Julia set of P and is denoted by Jp. The 
three sets Kp, Ap(co), and Jp are completely invariant, 
i.e., P{K P ) = Kp = P -1 (K P ), and so on. If we replace P 
by any iterate P k , then the filled Julia set, the attracting 
basin of oo, and the Julia set of P k are the same sets as 
those of P. 

For the polynomial Qo, we showed earlier that the 
filled Julia set is the closed unit disk, {z : \z\ < 1}; the 
attracting basin of oo is its complement, {z : |z| > 1}; 
and the Julia set is the unit circle, {z : |z| = 1}. 

The name “filled Julia set” refers to the faet that 
Kp is equal to Jp with all its holes (or, more formally, 
the bounded components of its complement) filled in. 
The complement of the Julia set is called the Fatou set 
and any connected component of it is called a Fatou 
component. 

Figures 2-6 show different examples of Julia sets 
of quadratic polynomials Q c . For simplicity we set 
k q c = K c , Aq c (oo) = A c (oo), and Jo c = J c - Note that all 



Figure 3 The Julia set of Qi/ 4 - Every point inside the 
Julia set (including the critical point 0) is attracted (under 
repeated applications of Q1/4) to the rationally indifferent 
fixed point \ with multiplier p = 1 , which belongs to J1/4. 



Figure 4 The Julia set of Q_c with a so-called Siegel disk 
around an irrationally indifferent fixed point of multiplier 
p = e 2 m(V 5 -i)/ 2 _ jjjg corresponding c-value is equal to 
\p - 4 p 2 . In the Siegel disk, the Fatou component contain- 
ing the fixed point, the action of Q_ c can, after a suitable 
change of variables, be expressed as w -> pw. The fixed 
point is marked and so are some orbits of points in its vicin- 
ity. The critical orbit is dense in the boundary of the Siegel 
disk. 

Julia sets J c are symmetric around O, owing to the sym- 
metry in the formula: Q c (-z) = Q c (z), which implies 
that if a point z belongs to J c , then so does -z. 

2.6 Properties of Julia Sets 

In this section we shall list several common properties 
of Julia sets. The proofs of these, which are beyond the 
scope of this article, mostly depend on the theory of 
normal families. 
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• The Julia set is the set of points for which the 
system displays sensitivity to initial conditions, 
i.e., the chaotic subset of the dynamical system. 

• The repelling orbits belong to the Julia set and 
form a dense subset of the set. That is, any point in 
the Julia set can be approximated arbitrarily well 
by a repelling point. This is the definition origi- 
nally used by Julia. (Of course, the name “Julia set” 
was used only later.) 

• For any point z in the Julia set, the set of iterated 
preimages (Jfc=i F~ k (z) forms a dense subset of 
the Julia set. This property is used when one is 
making computer pictures of Julia sets. 

• In faet, for any point z in C (with at most one or 
two exceptions), the closure of the set of iterated 
preimages contains the Julia set. 

• For any point z in the Julia set and any neighbor- 
hood U z of z, the iterated images F k (U z ) cover all 
of C except at most one or two exceptional points. 
This property demonstrates an extreme sensitivity 
to initial conditions. 

• If Q is a union of Fatou components that is com- 
pletely invariant (that is, F(Q) = O = F _1 (12)), 
then the boundary of Q coincides with the Julia 
set. This justifies the definition of the Julia set 
of a polynomial as the boundary of the attracting 
basin of oo. Compare also with figure 2, where the 
attracting basins of Qj o and A Co (oo) are examples 
of such completely invariant sets. 

• The Julia set is either connected or consists of un- 
countably many connected components. An exam- 
ple of the latter is shown in figure 6. 

• The Julia set is typically a fractal: when one zooms 
in on it, one finds that the complication of the set 
is repeated at all scales. It is also self-similar, in the 
following sense: for any noncritical point z in the 
Julia set, any sufficiently small neighborhood U z 
of z is mapped bijectively onto F(U Z ), a neighbor- 
hood of F(z). The Julia set in U z and the Julia set 
in F(l/ Z ) lookalike. 

All but the last two properties can easily he verified 

in the example Qo- In this case the exceptional points 

are 0 and °°. 

2.7 Bottcher Maps and Potentials 

2. 7. 1 Bottcher Maps 

Consider the quadratic polynomial Q_ 2 (z) = z 2 - 2. 

If z belongs to the interval [-2, 2], then z 2 belongs to 



Figure 5 (a) Some equipotentials and external rays 'Ro(d) 
of Qo in Ao (oo), the set of complex numbers of modulus 
greater than 1. (b) The corresponding equipotentials and 
external rays %-2 (6) of Q_2 in A_2(oo), the set of complex 
numbers not in K - 2 = /_ 2 = [-2,2]. The external rays that 
are drawn have arguments 0 = \ 2 where p = 0, 1 11. 

the interval [0,4], so Q_ 2 (z) also belongs to the inter- 
val [-2, 2]. It follows that this interval is contained in 
the filled Julia set K- 2 . 

The polynomial Q_ 2 ( 2 ) is not topologically equiv- 
alent to Qo (w) = w 2 , but when z is big enough, it 
behaves in a similar way, since 2 is small compared 



